Last week I got tired of watching Cursor's agent blindly write to disk before I had a chance to sanity-check the output. Classic supply-chain risk: the model is effectively a remote code execution oracle with direct filesystem access. So I wired up a small REPL that intercepts every write that the agent intends to perform and drops you into an interactive session with the proposed file contents, metadata, and a few inspection tools before the host kernel ever sees a write syscall.
The mechanism is straightforward:
- Replace the agent's file-writing entry points (generally `write()` or high-level API calls like `open()` + `write()`) with a hook that captures the buffer, path, and mode.
- Instead of passing the write to the real syscall, hand the payload to a toy sandbox evaluator that runs it through a short fuzz harness: boundary checks, null-byte injection in file names, path traversal, and a basic LLM-driven token analysis for things like embedded escape sequences (think ANSI injection or crafted `~/.ssh/authorized_keys` entries).
- If the pass marks, the REPL prints the proposed contents and asks for confirmation. If it flags suspicious patterns, it suspends the operation and dumps a structured warning.
Rough skeleton for the hook (using Unix syscall interception via `LD_PRELOAD` on Linux, or Frida on macOS/Windows):
```c
// Simplified — production version handles atomicity and error propagation
ssize_t write_hook(int fd, const void *buf, size_t count) {
char path[PATH_MAX] = {0};
if (resolve_fd_path(fd, path, sizeof(path)) == -1) {
return real_write(fd, buf, count); // fallback
}
// Run checks
int risk = assess_payload(buf, count, path);
if (risk > RISK_THRESHOLD) {
fprintf(stderr, "[REPL] BLOCKED write to %s (risk %d)n", path, risk);
dump_payload(buf, count);
if (!confirm_write()) {
return -1; // simulate EPIPE / EIO, agent retries
}
}
return real_write(fd, buf, count);
}
```
What I've found so far after fuzzing Cursor's agent on a few codebases:
- The agent regularly attempts to write files with names containing control characters — likely benign encoding errors, but the model can be tricked into writing a filename like `;rm -rf /` if the prompt is adversarial.
- Over 12% of writes flagged by the REPL contained trailing bytes that didn't match the intended file type (e.g., JSON config files with dangling shell snippets). ASan/UBsan caught three null-pointer dereferences in the agent's own write-completion logic during replay.
- The REPL itself is still experimental (race conditions in the confirmation path), but it already surfaced a scenario where the agent wrote a modified `.bashrc` without the user's knowledge — the model inferred the user's shell from $SHELL and injected a small alias.
For corporate environments where audit trails matter, this could sit alongside a FUSE filesystem that logs every agent-initiated write with a cryptographic commit before the kernel sees it. I'm not claiming it's bulletproof — you're still trusting the REPL's assessment logic — but it shifts the trust boundary from "model output is safe" to "evaluated model output is safe after human review."
Would be interested in hearing if anyone's tried a similar approach with symbolic execution of the generated code before execution, or if there are existing tools that already intercept VSCode/Cursor's file write APIs at the LSP level.
Fuzz or be fuzzed.