Skip to content

Forum

AI Assistant
Notifications
Clear all

Guide: Reproducing the latest prompt injection research on OpenClaw in 30 minutes

34 Posts
32 Users
0 Reactions
10 Views
(@claw_practitioner)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great question on the OpenAI-compatible endpoint. Yeah, `ic-eval` will talk to your oobabooga wrapper, but the JSON schema mismatch is real. The parser expects the exact structure from the OpenClaw spec, and a general chat wrapper often adds extra fields or uses slightly different keys.

I'd absolutely run a tiny baseline first, like ten simple function calls. If you see a bunch of "malformed response" traces before any actual injections fire, you'll know the noise level.

About the sandbox, it's a good idea for a baseline, but the OOM warnings in the thread are mainly for the full parser with recursive attacks. If you're just checking compatibility with your local model, start with a small dataset and maybe use `--parser minimal` for that first run to avoid memory spikes. You can switch to full parser later when you're hunting real injections.


Carlos


   
ReplyQuote
(@mod_tech_asia)
Eminent Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Thanks for getting this guide out there, user18. It's a solid starting point for people wanting to move past vendor slides.

You're right about the value being in the trace, not the score. I'd just add that newcomers should also run a known-clean baseline dataset first. The `sem-sync-2024-04` patterns can produce a flood of violations, and it's easy to miss a systemic parser weakness if you don't have a comparison point showing what normal, safe interactions look like in your logs.

Also, a few folks have already hit memory issues with the default parser on recursive patterns. Starting with `--parser minimal` for a quick compatibility check is fine, but as user260 pointed out, don't rely on it for your final safety assessment. The full parser exists for a reason.


- Asia (mod)


   
ReplyQuote
(@appsec_eval_junior_emily)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Thanks for putting this together. I'm trying to get a pilot program going at my company and having a reproducible benchmark is exactly what I need to justify the time.

> The key is to run it with the audit flag to capture the decision log

That's a great point about the trace being the valuable part. When you say the report shows where guardrails matched the bypasses, does the audit log flag which specific IronClaw parser rule fired? I'm trying to map these failures back to our internal risk framework, and knowing if it was a boundary check vs. a recursion limit would help a lot.


Due diligence.


   
ReplyQuote
(@enthusiast_tom_sec)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good guide, but I'd tighten the first step. Pulling the whole `sem-sync-2024-04` dataset straight away can bury you in logs if you haven't tuned your parser yet.

Better to start with a subset:
`ic-eval run --target your-endpoint --vectors encoded_imperative --audit`

Run that limited vector first. If your parser chokes on the output format from your local wrapper, you'll know immediately. The trace will be about 90% shorter and you can actually see if the violations are real model breaches or just JSON schema mismatches. Then move to the full dataset.

Also, the audit log does flag the specific rule. Look for `parser_rule_id` in the trace, it maps to the boundary checks. The recursion limits are under `depth_violation`.


Assume breach.


   
ReplyQuote
Page 3 / 3