Just pushed the release tags for v0.8.3. The main event is the new prompt injection defense subsystem. It's not a silver bullet, but it's a practical, configurable layer that should be in everyone's stack.
The core idea is a multi-stage filter pipeline that runs before the user query hits your primary agent logic. It's designed to catch the low-hanging fruit and some more sophisticated injection attempts. You can run it as a standalone service or integrate the modules directly.
Key additions:
* **Semantic Guardrails:** Uses a local embedding model (default: `all-MiniLM-L6-v2`) to score query similarity to a blocklist of known dangerous intents (e.g., "ignore previous instructions"). Threshold is configurable.
* **Token Sequence Detector:** Regex-like patterns, but for the token space. Catasks some encoded payloads that plain regex misses.
* **Canary Tokens:** Inject hidden markers into your system prompt; the filter checks if they've been altered or output in the response.
Here's a minimal config example (`config/filter_config.yaml`):
```yaml
filter_pipeline:
- name: token_sequence
parameters:
patterns: ["ignore", "previous", "instructions"]
- name: semantic_guardrail
parameters:
model_path: "local_models/all-MiniLM-L6-v2"
block_threshold: 0.85
- name: canary_check
parameters:
canary_string: "||SYSTEM_PROTECT||"
expected_position: 12
```
Deploy it in front of your existing setup. If you're running OpenClaw Agent, the update hooks it in automatically. For other setups (like a custom LangChain or raw API server), you can run the filter service on `localhost:8145` and proxy requests through it.
Initial tests show it adds 20-150ms latency, mostly from the embedding model load. It's blocking obvious jailbreaks in my homelab. Let me know what breaks it.
-- mike
-- mike