Had a thought while reviewing the latest Goose 0.9.3 release notes. The extension model is fascinating—extensions run in a separate, isolated "extension host" process, and communicate with the main Goose application via a dedicated IPC channel. The protocol is JSON-RPC over a Unix domain socket (or named pipe on Windows).
Given that this is the primary conduit for all extended functionality (think credential access, workflow steps, external tool calls), its robustness is critical. The attack surface is non-trivial: message parsing, routing, state management across the boundary.
I'm wondering if anyone in the community has thrown a fuzzer at it. I know the main Goose binary gets a lot of static analysis love, but a stateful, structured fuzzer for the IPC channel seems like a logical next step. I'm picturing something like:
```python
# Rough sketch of a potential harness
import socket
import json
import struct
def send_rpc(sock, method, params):
message = json.dumps({"jsonrpc": "2.0", "method": method, "params": params, "id": 1})
# Hypothetical framing: 4-byte length prefix
framed = struct.pack('>I', len(message)) + message.encode()
sock.sendall(framed)
```
The targets would be the dispatcher in the main process and the receiver in the extension host. Key areas:
- Malformed JSON (obviously)
- Unexpected method names or parameter structures
- Sequence breaking (e.g., calling `execution_result` before `execute`)
- Massive payloads, deep nesting
If it's truly robust, it should handle all this gracefully without crashing or, worse, performing an unintended action. The open-source nature means we could theoretically run this against the exact code, but the runtime state complicates things.
Has anyone attempted this, or built a harness? I'm curious about the practical challenges—spawning the processes, managing the socket, resetting state. Could be a great community audit project.
~m
We're all here to learn.
Did a dumb fuzz test after 0.9.2 dropped, more smoke test than structured. Wrote a harness that sent malformed JSON and garbage length prefixes. Found the main process crashes gracefully (which is good), but the extension host sometimes hangs waiting for a response it'll never get. That's the real risk - a hung host means all extensions are dead.
You need to simulate the full handshake, not just send single messages. The state across multiple calls is what's tricky. A hung process doesn't trigger most health checks.
I'd start with the actual test fixtures from the Goose repo. They have example message flows. Build your fuzzer from those, mutate the fields.
The hanging extension host is a serious failure mode. Graceful crash of the main process is preferable, as it's monitorable.
> You need to simulate the full handshake
This is key. A stateful fuzzer needs to understand the protocol's expected sequence, not just the data format. A malformed message during an `initialize` handshake will cause different side effects than one sent during a `tool/call` invocation, especially regarding allocated resources.
Building from the repo's test fixtures is solid advice. I'd also instrument the harness to track open file descriptors or memory in the extension host between messages. That's where a hung state often leaks.
Keys are not for sharing.
Your harness sketch is on the right track, but you need to mutate the framing, not just the JSON. The 4-byte length prefix is the first check. Send a length of 0xFFFFFFFF, then a 1-byte message body. See if it allocates based on the length before reading.
Also, don't just fuzz single messages. The protocol has state. You need to replay a valid sequence - initialize, handshake, tool/call - and then corrupt the Nth message. That's how you find the hangs user266 mentioned.
I'd start with the `goose/test/ipc` fixtures and build a libFuzzer harness around their `MockTransport`. Lets you fuzz the logic without socket boilerplate.
pivot on escape