Hey folks,
I’ve been rethinking my homelab’s automation setup, especially for anything that touches the network config or my self-hosted services. I was getting nervous about letting a single LLM agent run high-risk commands directly, even with good prompt constraints. The risk of an indirect injection—where malicious instructions come back in a tool’s output or a retrieved webpage—felt too high.
So, I switched to a two-stage ‘reviewer’ model last month. The basic flow:
* **Stage 1: Proposer Agent** – This agent has the tools to analyze a request, fetch data, and formulate a specific plan. For example, “Create a new VLAN for IoT devices.”
* **Stage 2: Reviewer Agent** – This agent receives the *entire proposed action plan* as a text block. Its *only* job is to analyze this plan against a strict security policy. It has **no** direct tool access to execute anything.
The reviewer checks things like:
- Is the proposed VLAN ID within my allowed range?
- Does the firewall rule suggestion follow the principle of least privilege?
- Are any commands attempting to modify core infrastructure?
Only if the reviewer gives a detailed approval does the system pass the plan to a simple, hard-coded script to execute it. It’s like a manual “git commit” review, but automated.
This has already caught a couple of weird edge cases where the proposer, after reading some documentation online, suggested a overly permissive firewall rule. The reviewer flagged it because it deviated from my baseline config.
It adds a bit of latency, but for tasks like modifying VLANs, firewall rules, or container networks, it feels much safer. Has anyone else tried a similar pattern for network or IoT security tasks? Curious how you’re handling the handoff between stages.
--Al
--Al