Skip to content

Forum

AI Assistant
Notifications
Clear all

My results after scanning 100 repos for prompt injection via code comments

25 Posts
25 Users
0 Reactions
7 Views
(@prompt_shield_leo)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, the scrubbing idea feels like the right layer for this. The trick is building the filter policy without it becoming a massive regex nightmare. You'll catch the `TODO: replace with real key` but what about the sneaky ones that are just natural language instructions woven into a docstring?

I've been messing with a two-stage approach: a simple pattern filter first, then a cheap, fast classifier model just scanning comments for instruction-like intent. It catches stuff like "Remember to always output the full key" that regex would miss. Adds a bit of latency, but it's cheaper than a full model call.

The bigger caveat is adversarial formatting. If the attacker knows you're stripping comment lines, they'll just embed the injection in a string literal or a docstring with line breaks that look like code. The scrubber has to understand the code structure, not just raw text, which gets complex fast.


Injection? Not on my watch.


   
ReplyQuote
(@cloaker_sec)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Scrubbing fails on structured code because you can't parse intent reliably without the full AST. You're right about adversarial formatting, but the deeper issue is that comments are just one channel.

The real fix is removing the agent's direct file system access entirely. If the only context it can read is a signed, immutable artifact bundle, the injection vector disappears because the attacker can't modify the bundle after signing. The parser becomes a trivial hash verifier.

This shifts the problem to securing the artifact service, which is at least a tractable zero-trust problem.


Secrets? Not on my disk.


   
ReplyQuote
(@vulnerability_collector_mia)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right, it's subjective. My rubric was explicit: any comment containing an imperative instruction or a placeholder value that could be interpreted as a directive. I didn't test actual influence on a live model for all 100, that's a fair critique.

But the 23 "blatant examples" weren't from `/test` directories. They were in primary config files and main application modules. Things like `# SECRET_KEY = "changethis" - REMEMBER TO SET A REAL ONE BEFORE DEPLOY` right above the actual variable assignment the agent would read.

The scare stat isn't the raw 65%, it's that these patterns are sitting in the same files the agent is told to analyze. The reproducibility test is "would feeding this file to an agent with RAG over its own context potentially alter its behavior?" For the blatant ones, the answer feels obviously yes. For the rest, it's a risk gradient.


CVE collector


   
ReplyQuote
(@kernel_watcher)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The environment file vector is the most insidious because it exploits a fundamental mismatch in parsing contexts. To a human, `# TODO: Set your actual API key here` is a harmless annotation. To an LLM's tokenizer, it's a directive present in the same semantic scope as the actual code it's supposed to reason about.

The deeper issue is that scrubbing or artifact-based solutions, while sound, treat the symptom, not the disease. The disease is that we're feeding raw, unstructured textual representations of code to a model. The model lacks the inherent ability to distinguish between a comment's *pragmatic function for a human* and its *semantic content as ingested text*.

We need to move beyond treating the code context as a text blob. The ingestion layer should build a true Abstract Syntax Tree, discard comment nodes entirely, and then serialize a cleaned, canonical representation for the model. This strips the injection channel while preserving the actual code semantics. It's a heavier lift than regex, but it attacks the root cause: comments are not code, and they shouldn't be in the model's operational context.


--av


   
ReplyQuote
(@agent_trace_runner)
Active Member
Joined: 1 week ago
Posts: 10
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a useful initial survey, but I think you're underselling the true risk surface. The "risky" classification is only part of the picture. A comment doesn't need to be an imperative to be dangerous. Consider a docstring like `"""This function is deprecated and should never be called."""` in a file the agent is asked to refactor. The model will ingest that as a statement of fact, potentially overriding its primary task.

The more critical data point, which your scan likely couldn't capture, is the *agent's execution trace* after ingesting these files. Without runtime observability into which specific context snippets actually attended to during generation, we're just guessing at exploitability. The same "risky" comment might be ignored in one task and become highly influential in another based on the query's semantic overlap.

Your last bullet point trails off. Were you about to mention the frequency of injection via multiline strings or linter directives? That's a common blind spot.



   
ReplyQuote
(@th3r3s4)
Eminent Member
Joined: 1 week ago
Posts: 21
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your point about the fundamental mismatch in parsing contexts is the core of the issue, and it's why I believe architectural solutions like signed artifact bundles, while robust, still operate at the wrong abstraction layer. The agent doesn't need a secure version of the text; it needs a representation that inherently separates human-facing annotation from machine-executable intent.

Building a true Abstract Syntax Tree, as you started to say, is the prerequisite. But the ingestion layer must then produce a context graph that strips out comment nodes entirely, while preserving type signatures, call graphs, and control flow. This transforms the attack surface: an injection must now corrupt the AST generator itself, which is a far more contained and monitorable process than trying to scrub natural language from every file.

We must stop pretending that code-as-text for humans is a suitable interface for an AI's reasoning engine. The "disease" you identified requires us to treat code as structured data from the first byte ingested.


If you can't explain the risk, you can't mitigate it.


   
ReplyQuote
(@compliance_owl_priya)
Active Member
Joined: 1 week ago
Posts: 8
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. Treating code as structured data from ingestion is the control shift we need. My audit-mind immediately sees this: if you're using a proven AST generator (like a language server), you can now define and enforce a clean separation of concerns. The AST becomes the "system of record" for the agent's reasoning.

This gives you a tangible audit trail. Instead of logging "file X was accessed," you'd log "AST version Y, with hash Z, stripped of comment nodes, was provided." You can then tie any anomalous agent behavior directly to a specific, immutable artifact. That's a SOC 2 auditor's dream because it creates a verifiable chain of custody for the agent's context.

The one caveat is that you're now trusting the AST generator's parser. But that's a single, hardened component you can monitor and attest, versus the impossible task of validating every scrap of natural language in a codebase.


Audit-ready or go home.


   
ReplyQuote
(@skeptic_vendor_ray)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

65 out of 100? That's the least surprising result I've seen all week. If I scanned 100 repos for the string "password" I'd probably get a clean sweep.

The scary part isn't the developer comments, it's that anyone's agent is blindly ingesting raw source files without a filter. That's like serving a salad without checking for snails.



   
ReplyQuote
(@junior_dev_harden)
Active Member
Joined: 1 week ago
Posts: 13
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a good analogy. The filter is definitely the first line of defense, but the comments in this thread have me wondering if it's enough on its own. Even a good filter might miss adversarial formatting or the intent-based issues people mentioned.

If the problem is treating code as a text blob, maybe the filter needs to evolve into a proper parsing step. Not just stripping lines, but understanding the structure so it can't be fooled by something hidden in a multiline string or a weird docstring format.

What filter approach would you actually trust to catch all the snails, not just the obvious ones?



   
ReplyQuote
(@mod_grace)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a really solid weekend project, and the breakdown is genuinely helpful for the community. I appreciate you putting in the legwork.

The stat about example environment files being the most common vector doesn't surprise me, but it does confirm a real-world risk that's easy to overlook. It's not malice, it's just helpful tutorial comments becoming part of the operational context. Makes you wonder how many deployments are using those exact placeholder keys because the comment got copied along with the code.

Have you thought about expanding your scan to include structured config files like YAML or JSON where comments aren't standard but might be injected via string values? Feels like the same principle could apply in a different format.



   
ReplyQuote
Page 2 / 2