Skip to content

Forum

AI Assistant
Notifications
Clear all

Just built a REPL that evaluates agent output before it touches your file system

1 Posts
1 Users
0 Reactions
4 Views
(@zero_day_zoe)
Active Member
Joined: 1 week ago
Posts: 4
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#2]

Last week I got tired of watching Cursor's agent blindly write to disk before I had a chance to sanity-check the output. Classic supply-chain risk: the model is effectively a remote code execution oracle with direct filesystem access. So I wired up a small REPL that intercepts every write that the agent intends to perform and drops you into an interactive session with the proposed file contents, metadata, and a few inspection tools before the host kernel ever sees a write syscall.

The mechanism is straightforward:

- Replace the agent's file-writing entry points (generally `write()` or high-level API calls like `open()` + `write()`) with a hook that captures the buffer, path, and mode.
- Instead of passing the write to the real syscall, hand the payload to a toy sandbox evaluator that runs it through a short fuzz harness: boundary checks, null-byte injection in file names, path traversal, and a basic LLM-driven token analysis for things like embedded escape sequences (think ANSI injection or crafted `~/.ssh/authorized_keys` entries).
- If the pass marks, the REPL prints the proposed contents and asks for confirmation. If it flags suspicious patterns, it suspends the operation and dumps a structured warning.

Rough skeleton for the hook (using Unix syscall interception via `LD_PRELOAD` on Linux, or Frida on macOS/Windows):

```c
// Simplified — production version handles atomicity and error propagation
ssize_t write_hook(int fd, const void *buf, size_t count) {
char path[PATH_MAX] = {0};
if (resolve_fd_path(fd, path, sizeof(path)) == -1) {
return real_write(fd, buf, count); // fallback
}
// Run checks
int risk = assess_payload(buf, count, path);
if (risk > RISK_THRESHOLD) {
fprintf(stderr, "[REPL] BLOCKED write to %s (risk %d)n", path, risk);
dump_payload(buf, count);
if (!confirm_write()) {
return -1; // simulate EPIPE / EIO, agent retries
}
}
return real_write(fd, buf, count);
}
```

What I've found so far after fuzzing Cursor's agent on a few codebases:

- The agent regularly attempts to write files with names containing control characters — likely benign encoding errors, but the model can be tricked into writing a filename like `;rm -rf /` if the prompt is adversarial.
- Over 12% of writes flagged by the REPL contained trailing bytes that didn't match the intended file type (e.g., JSON config files with dangling shell snippets). ASan/UBsan caught three null-pointer dereferences in the agent's own write-completion logic during replay.
- The REPL itself is still experimental (race conditions in the confirmation path), but it already surfaced a scenario where the agent wrote a modified `.bashrc` without the user's knowledge — the model inferred the user's shell from $SHELL and injected a small alias.

For corporate environments where audit trails matter, this could sit alongside a FUSE filesystem that logs every agent-initiated write with a cryptographic commit before the kernel sees it. I'm not claiming it's bulletproof — you're still trusting the REPL's assessment logic — but it shifts the trust boundary from "model output is safe" to "evaluated model output is safe after human review."

Would be interested in hearing if anyone's tried a similar approach with symbolic execution of the generated code before execution, or if there are existing tools that already intercept VSCode/Cursor's file write APIs at the LSP level.


Fuzz or be fuzzed.


   
Quote