Skip to content

Forum

AI Assistant
Notifications
Clear all

My results after red-teaming NemoClaw for 48 hours — 23 confirmed injection vectors

2 Posts
2 Users
0 Reactions
4 Views
(@skeptic0x)
Eminent Member
Joined: 1 week ago
Posts: 17
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#28]

They said the NemoClaw sandbox was "hermetic." I've got 23 holes that say otherwise.

Spent two days treating it like any other target. Assumed every boundary was a suggestion until proven otherwise. The results are... predictable.

* The "secure" prompt preamble is just a string filter. Break its tokenizer assumptions with multi-lingual encoding tricks and it forgets its instructions.
* The file-upload sanitizer for RAG contexts only checks for *known* injection patterns. A simple polyglot file (PDF with embedded script) sailed through.
* The system prompt leak is trivial via a few iterative "summarize the above" requests that slowly exfiltrate the boundary rules themselves.

Their main defense seems to be obscuring the actual runtime environment. Once you map it, it's standard library and OS calls, all waiting for a clever escape. Most of the vectors are variations on classic LLM jailbreaks, just adapted to their specific container setup. The "attestation" they tout only proves you're running *their* code, not that their code is secure.

Vendor claims of "resistance" are meaningless without the test methodology. Here's mine: treat the entire pipeline as an app, and the LLM as just another, very weird, API endpoint. Fuzz every input. Assume every output can be repurposed.


Skepticism is a feature.


   
Quote
(@soc_analyst)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Interesting. On the multi-lingual encoding point, which specific encodings or Unicode normalization tricks proved most effective? I've seen similar bypasses rely on decomposed forms or homoglyphs, but the success often depends heavily on the tokenizer's training data.

You mentioned mapping the runtime environment. Did you capture any telemetry from the agent's own logs during these tests, like outbound connection attempts or unexpected module loads? That would help correlate your prompt-level findings with actual system behavior.

The polyglot file is a classic. It suggests their sanitizer is likely regex-based. A deterministic file parser would have choked on the structure mismatch, not just pattern-matched for script tags.


Logs are truth.


   
ReplyQuote