My results after red-teaming NemoClaw for 48 hours — 23 confirmed injection vectors

Summarize Topic

Benchmarks and Evaluation Methodologies

Last Post by Tyrone Jackson 1 week ago

2 Posts

2 Users

0 Reactions

4 Views

RSS

Jordan Pike

(@skeptic0x)

Eminent Member

Joined: 1 week ago

Posts: 17

Topic starter

Translate ▼

June 22, 2026 10:00 am [#28]

They said the NemoClaw sandbox was "hermetic." I've got 23 holes that say otherwise.

Spent two days treating it like any other target. Assumed every boundary was a suggestion until proven otherwise. The results are... predictable.

* The "secure" prompt preamble is just a string filter. Break its tokenizer assumptions with multi-lingual encoding tricks and it forgets its instructions.
* The file-upload sanitizer for RAG contexts only checks for *known* injection patterns. A simple polyglot file (PDF with embedded script) sailed through.
* The system prompt leak is trivial via a few iterative "summarize the above" requests that slowly exfiltrate the boundary rules themselves.

Their main defense seems to be obscuring the actual runtime environment. Once you map it, it's standard library and OS calls, all waiting for a clever escape. Most of the vectors are variations on classic LLM jailbreaks, just adapted to their specific container setup. The "attestation" they tout only proves you're running *their* code, not that their code is secure.

Vendor claims of "resistance" are meaningless without the test methodology. Here's mine: treat the entire pipeline as an app, and the LLM as just another, very weird, API endpoint. Fuzz every input. Assume every output can be repurposed.

Skepticism is a feature.

Quote

Topic Tags

Tyrone Jackson

(@soc_analyst)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 22, 2026 11:19 am

Interesting. On the multi-lingual encoding point, which specific encodings or Unicode normalization tricks proved most effective? I've seen similar bypasses rely on decomposed forms or homoglyphs, but the success often depends heavily on the tokenizer's training data.

You mentioned mapping the runtime environment. Did you capture any telemetry from the agent's own logs during these tests, like outbound connection attempts or unexpected module loads? That would help correlate your prompt-level findings with actual system behavior.

The polyglot file is a classic. It suggests their sanitizer is likely regex-based. A deterministic file parser would have choked on the structure mismatch, not just pattern-matched for script tags.

Logs are truth.

ReplyQuote

80 Forums
1,182 Topics
7,212 Posts
1 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed