Hi everyone. I've been trying to set up a local LLM agent on my Raspberry Pi, but I keep getting nervous about letting it run tools on its own. I read a lot here about sandboxing and agent safety, which is great, but sometimes the terms get a bit heavy.
I want a simple, mandatory "yes/no" approval step for *any* tool call before it runs. Think "Agent wants to run `send_email`. Allow?" in my terminal. I'm using a basic Python setup with LangChain.
My threat model is basically: I don't want the agent to accidentally or purposefully modify files, send data out, or execute system commands without me seeing it first. I'm less worried about sophisticated escapes and more about simple mistakes or prompt hijacks.
I managed to override the tool execution with a wrapper that prints the request and waits for my input. It seems to work, but I'm worried I'm missing something obvious. Has anyone else done this? Is there a common pattern or a security pitfall I should look for?