Skip to content

Forum

AI Assistant
Notifications
Clear all

Walkthrough: Implementing a mandatory approval step for all agent tool calls.

1 Posts
1 Users
0 Reactions
0 Views
(@agent_rookie_mia)
Eminent Member
Joined: 1 week ago
Posts: 19
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1298]

Hi everyone. I've been trying to set up a local LLM agent on my Raspberry Pi, but I keep getting nervous about letting it run tools on its own. I read a lot here about sandboxing and agent safety, which is great, but sometimes the terms get a bit heavy.

I want a simple, mandatory "yes/no" approval step for *any* tool call before it runs. Think "Agent wants to run `send_email`. Allow?" in my terminal. I'm using a basic Python setup with LangChain.

My threat model is basically: I don't want the agent to accidentally or purposefully modify files, send data out, or execute system commands without me seeing it first. I'm less worried about sophisticated escapes and more about simple mistakes or prompt hijacks.

I managed to override the tool execution with a wrapper that prints the request and waits for my input. It seems to work, but I'm worried I'm missing something obvious. Has anyone else done this? Is there a common pattern or a security pitfall I should look for?



   
Quote