Skip to content

Forum

AI Assistant
Notifications
Clear all

What is the best way to sanitize LLM responses before they get parsed into tool calls by SuperAGI?

1 Posts
1 Users
0 Reactions
1 Views
(@enthusiast_mike_d)
Eminent Member
Joined: 2 weeks ago
Posts: 21
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1357]

Hey folks, been tinkering with my SuperAGI deployment for a few months now, mostly in my homelab but with a public-facing instance for a few trusted buddies. One thing that's been chewing on my brain—and my logs—is the raw, unfiltered nature of LLM responses before they hit the tool parsing layer.

We're all trusting these agents to make sane decisions, but as we give them more powerful tools (like shell access, API keys, or my custom Home Assistant plugin that can unlock my front door 😅), the risk of a malformed or maliciously-crafted response slipping through becomes real. The default install feels very "open" in this regard. I've seen my agent get confused by markdown backticks in a response and try to execute the content as a command, or hallucinate a completely invalid JSON structure that crashes the tool parser.

So, what's the best practice here? I'm talking about a sanitation layer—something that validates and cleans the LLM's text output *after* it's generated but *before* SuperAGI tries to interpret it as a tool call. I've been rolling my own in a middleware container, but I'd love to compare notes.

Here's a simplified snippet of my current "scrubber" that runs as a proxy between the LLM output and SuperAGI's main logic. It's in Python and lives in its own Docker service.

```python
import re
import json
import logging

def sanitize_tool_response(raw_llm_output: str) -> dict:
"""
Attempts to extract and validate a JSON tool call from potentially noisy LLM text.
Returns a safe, default 'failure' dict if sanitation fails.
"""
# 1. Aggressively strip markdown code blocks
cleaned = re.sub(r'```[a-z]*n?|```', '', raw_llm_output)

# 2. Search for something that looks like a JSON object, focusing on 'tool' and 'args' keys
json_match = re.search(r'{.*"tool".*:.*".*".*"args".*:.*{.*}.*}', cleaned, re.DOTALL)
if not json_match:
logging.warning(f"No valid JSON tool structure found in: {cleaned[:200]}")
return {"tool": "error", "args": {"message": "Sanitizer: Could not locate valid tool call."}}

potential_json = json_match.group(0)

# 3. Basic JSON sanity & key validation
try:
parsed = json.loads(potential_json)
if not isinstance(parsed, dict):
raise ValueError("Top-level is not a dict")
if "tool" not in parsed or "args" not in parsed:
raise ValueError("Missing 'tool' or 'args' keys")

# 4. Allowlist of known safe tools (populated from my config)
safe_tools = ["google_search", "send_email", "hass_get_state", "calculate"]
if parsed["tool"] not in safe_tools:
raise ValueError(f"Tool '{parsed['tool']}' not in allowlist")

# 5. Basic escaping on string args (very simplistic, adjust as needed)
if isinstance(parsed["args"], dict):
for key, value in parsed["args"].items():
if isinstance(value, str):
parsed["args"][key] = re.sub(r'[;|&$]', '', value) # remove shell metachars

return parsed

except json.JSONDecodeError as e:
logging.error(f"JSON decode failed: {e}. Input was: {potential_json[:500]}")
except ValueError as e:
logging.error(f"Validation failed: {e}")

# Fallback to a safe, do-nothing tool call
return {"tool": "no_op", "args": {}}
```

This approach is *okay*, but it feels brittle. I'm also thinking about:

* **Context-aware validation:** Should the sanitizer know which tools the specific agent *actually* has access to, not just a global allowlist?
* **Schema enforcement:** Using JSON Schema or Pydantic models to strictly validate the `args` against expected parameters for each tool.
* **Rate-limiting and anomaly detection:** Blocking tool calls that are repetitive or out of character for the agent's goal.

I'm curious how others are handling this. Are you baking checks into custom tools? Using a reverse proxy with response rewriting? Or has anyone found a config option within SuperAGI itself that I've totally missed? The marketplace plugins are awesome, but every new one feels like a new potential injection vector if the LLM's output isn't scrubbed.

Let's share some setups—I'll gladly post my Docker Compose snippets if anyone's interested.

--Mike


If it's not broken, break it for security.


   
Quote