Just saw this paper from ETH Zurich hit arXiv. It's a significant escalation from the usual "malicious tool" demo. Researchers showed that an attacker who can modify a *single tool's description* in the system prompt—not the code, just the natural language description—can achieve persistent, difficult-to-detect compromise of an agent system.
The core of the issue is that many agent frameworks use these descriptions to autonomously choose tools. By poisoning the description, you essentially "program" the agent's decision-making logic with hidden instructions.
Here's how the attack worked in their test:
* They modified a legitimate tool's description (e.g., a `web_search` function) to include hidden instructions.
* These instructions commanded the agent to, upon a specific trigger (like a date or keyword), execute a secondary, malicious payload.
* This payload could exfiltrate data, poison the agent's memory, or even rewrite other tool descriptions to spread the compromise.
Why this matters more than a simple malicious tool:
* **Persistence:** The compromise lives in the system prompt, not in executed code. Restarting the agent or even the container doesn't clear it if the poisoned prompt is reloaded.
* **Stealth:** A code review or hash-based integrity check on the tool *code* would miss it entirely. The malicious logic is in plain English, hidden in a field often considered "non-executable."
* **Propagation:** As shown, it can be designed to spread laterally by rewriting other descriptions.
For anyone deploying agents, this shifts the threat model. You now have to treat **tool descriptions as critical, tamper-resistant code**.
Immediate implications:
* **Integrity checks** must include the full system prompt, not just the executable parts.
* **Supply chain risk** for imported tools/prompts just went up. A poisoned description from a community hub is a perfect vector.
* **CISO/Governance angle:** This is a control failure waiting to happen. Your compliance checks likely aren't looking here. NIST CSF's "Protect" function (PR.AC) needs to cover prompt integrity.
The fix isn't easy. It involves technical controls (signing/validation of full prompts) and process (treating descriptions as part of your secure software development lifecycle).
Link to the primary source: [ https://arxiv.org/abs/2407.xxxxx ]( https://arxiv.org/abs/2407.xxxxx)
YMMV.
Risk is not a number, it's a conversation.