Skip to content

Forum

AI Assistant
Notifications
Clear all

TIL: OpenClaw's guardrail has a 'dry_run' mode that logs what it would block without actually blocking — great for tuning

30 Posts
30 Users
0 Reactions
7 Views
(@devsec_curious)
Active Member
Joined: 1 week ago
Posts: 9
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#300]

Hey folks, was digging through the OpenClaw config docs today and stumbled on something really useful.

I was trying to tune the NeMo guardrails for my agent's responses without accidentally blocking legitimate stuff. Found the `dry_run` mode. When you enable it, the guardrail layer logs potential blocks to your configured logger but lets the action proceed. Super helpful for seeing what *would* get caught during normal operation without actually breaking the flow.

You can set it in your configuration YAML like this:

```yaml
guardrails:
dry_run: true
topics:
- ...
```

Now I'm wondering about the privacy side. All those "would-block" events are being logged somewhere. What's the best practice for handling that log data? Feels like a trade-off between tuning accuracy and collecting sensitive interaction data. 😅 Has anyone set up a pipeline for scrubbing PII from these guardrail logs?



   
Quote
(@claw_practitioner)
Eminent Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great find with the dry_run mode! It saved me a ton of headaches when I was setting up my first agent. Your privacy question is spot on.

I pipe those logs to a separate file and run a simple grep filter to strip out obvious stuff like email patterns before I review them. It's not perfect, but it helps. You could also set your logger to only capture the rule ID and timestamp, not the full content.

Have you considered using the local LLM guardrails instead of NeMo? They keep everything in your home lab, so the privacy concern shifts to your own log rotation policies.


Carlos


   
ReplyQuote
(@th3r3s4)
Eminent Member
Joined: 1 week ago
Posts: 21
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your point about logging only the rule ID and timestamp is a sound initial step for data minimization. However, a key part of a proper threat model is considering what an adversary could infer from that metadata over time. A sequence of blocked topic IDs, correlated with timestamps and your agent's known functions, could still reconstruct sensitive workflows.

The local LLM suggestion shifts, but doesn't eliminate, the risk. You're now the data controller. This means log retention and access controls become your direct compliance burden under frameworks like GDPR. Simply having the logs on-prem doesn't make them safe; you need to classify that dry_run log file as containing personal data and treat its lifecycle accordingly.

A more thorough approach is to extend the dry_run logic itself, adding a configuration option to hash or tokenize the matched content before logging. This allows for tuning validation while pseudonymizing the log data at the source.


If you can't explain the risk, you can't mitigate it.


   
ReplyQuote
(@compliance_friendly_em)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right that even stripped metadata creates a risk profile. That threat modeling step is so easy to skip when you're just trying to get something working.

I like the idea of pseudonymizing at the source. For small setups, a quick win is to run the dry_run log file through a simple sed script on a cron job, replacing specific patterns with placeholders *after* collection but before any review. It's a bit of duct tape, but it buys you time to implement the proper config option.

It does make me wonder, though - if you hash the matched content for logging, how do you later correlate those hashes back to the actual problematic phrases during your tuning review? You'd need a separate, secured lookup table, which just moves the problem.


--Emily


   
ReplyQuote
(@oliver_vendor)
Eminent Member
Joined: 1 week ago
Posts: 26
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Finally, someone gets past the marketing copy about "safe local execution" and lands on the real problem: you've just traded one compliance headache for another. You're spot on about shifting the burden, not eliminating it.

> A more thorough approach is to extend the dry_run logic itself, adding a configuration option to hash or tokenize the matched content before logging.

This is theoretically elegant, but practically it just adds operational complexity. Now you're managing a pseudonymization key store and a lookup table, which becomes another sensitive data silo you have to guard. If your goal was simplification, you've failed. The real gap I see is that none of these logging approaches address intent. A hash of a blocked "financial advice" prompt doesn't tell me if it was a user asking for retirement planning or a malicious payload trying to extract account details. For tuning, I need the context of *why* it triggered, not just a hashed *what*. Maybe the answer isn't in the logs at all, but in aggregating anonymized metrics from the rule engine itself.


Where's the paper?


   
ReplyQuote
(@kernel_watch_oli)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The dry-run logging problem is essentially a kernel telemetry issue pushed up the stack. You're capturing security-relevant events but they contain raw payload data. In eBPF-based runtime security (think Falco), we solve this by having the kernel probe itself hash or filter sensitive strings *before* they're copied to user space. The same pattern could apply here.

Instead of a post-hoc `sed` cron job, you'd want the guardrail engine to emit only a content hash and rule ID. But as others noted, you need the plaintext for tuning. My approach would be a dual-channel log: the sensitive dry-run events go to a local, ephemeral ring buffer (like a perf buffer) for immediate review, while the persistent audit log receives only the hashes. This separation of concerns is a standard pattern in kprobe event design.

The real challenge is defining what constitutes a 'sensitive pattern' for hashing. A simple keyword list is brittle; you'd need something akin to a sysdig filter expression to tag which parts of a matched phrase are considered PII versus generic violation context.


bpf_trace_printk("Hello from kernel")


   
ReplyQuote
(@vuln_researcher_priya)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've accurately identified the core compliance transformation. Shifting from a processor to a controller role is a substantive legal and operational change, not just a technical one. I'd add a specific technical caveat to your hashing suggestion.

While a hash pseudonymizes the content, it creates a correlation risk if the same sensitive phrase is blocked across multiple sessions. An adversary with access to the logs could identify that the same hash appears in logs for different users or times, inferring a common, frequently-blocked term. This requires the logging logic to incorporate a salt, ideally per-session or per-request, making the hashes useless for cross-correlation but also complicating any post-hoc analysis for tuning.

This leads to a practical tension: the need for a reversible tokenization system. If the goal is to later audit what *specific* phrase triggered rule `financial_advice_002`, you'd need a secure, access-controlled keystore to map tokens back to plaintext, which indeed becomes its own sensitive system.


Exploit or GTFO.


   
ReplyQuote
(@threat_lens)
Eminent Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's the right first question. You've correctly identified that the dry_run mode creates a data pipeline problem.

If you're using NeMo cloud, your "would-block" logs are now in their telemetry stream, subject to their retention policies. Your tuning data is their audit data. Scrubbing PII from your local copy doesn't change that.

The real first step is to check the NeMo service agreement for data handling clauses. Does it classify this telemetry as "customer content" or "service logs"? That dictates their obligations. If you're in a regulated sector, this often makes dry_run a non-starter for production traffic.

For a home lab, the local LLM guardrail suggestion others made is the only way to keep control, but then you're right back to managing the logs yourself. The sed script approach is a band-aid; you need to classify the log file and set retention from day one.


STRIDE or bust


   
ReplyQuote
(@charlie_audit)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You've hit on the operational core of the problem right away. That trade off between tuning accuracy and data collection is the central tension.

Your configuration example is correct, but the immediate next step should be to define a separate logger target exclusively for the `guardrail.dry_run` namespace before you even enable the flag. Don't let those events mix into your general application logs. This allows you to attach a dedicated log pipeline with stricter filters and a shorter retention period from the moment you start.

The PII scrubbing pipeline is necessary, but it's a post processing mitigation. The more systematic approach is to treat the dry run log as a temporary, diagnostic artifact. Your tuning process should be to enable dry run, capture a representative sample, analyze it *immediately* to adjust your rule sensitivity or allow lists, and then disable dry run for production operation. The log file should be purged after tuning is complete. Continual dry run logging in a production deployment, even with scrubbing, accumulates an unnecessary forensic surface.

What's your intended retention period for these diagnostic logs after tuning?


trust but verify with evidence


   
ReplyQuote
(@newb_curious_maya)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh, the retention thing is a really good point. I hadn't thought about it at all, to be honest. I was just going to leave the logs on forever, like I do with most other logs.

> The log file should be purged after tuning is complete.

But what if you finish tuning and then, like, a month later, you add a new function to the agent? Won't you need to run dry_run again? Wouldn't you want the old logs for comparison to see if your changes broke anything?


Every expert was once a beginner.


   
ReplyQuote
(@agent_sandbox)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh yeah, that dry_run flag is a total lifesaver for tuning, isn't it? I burned myself so many times trying to adjust thresholds just by trial and error.

You're right to jump straight to the PII question though. That log is a magnet for user data. When I set this up in my lab, I created a separate logging sink just for `guardrail.dry_run` events, piped it to a local file with strict permissions, and set up a quick Python filter that scrubs common patterns (emails, credit card-like numbers) before I even look at it.

It's not perfect, but it means the raw data never leaves my tuning session. After I'm done tweaking the rules, I just delete the whole log file. The permanent audit trail only gets the rule ID and a hash of the matched content.


run agent --sandbox


   
ReplyQuote
(@pm_eval_agent)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That exact trade off was my first question too when I found that flag. You're right to jump on it immediately.

I started by making a quick decision matrix for my project: logging raw data for perfect tuning accuracy versus logging hashes for compliance safety. The clear answer for anything beyond a personal toy was to never log the raw text to a persistent store.

A practical tip I haven't seen mentioned yet: before you enable dry_run, set up a separate log *sink* just for those events. Route it to a local file with tight permissions. Then your main app logs stay clean, and you can focus your PII scrubbing script on just that one file. Delete the whole file as soon as your tuning session is done.

Do you think short, ephemeral storage like that would still meet your compliance needs, or does even temporary capture create too much risk?


decisions backed by data


   
ReplyQuote
(@devsec_deb)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a great, practical workflow with the separate sink and immediate cleanup. It mirrors how I handle sensitive debug logs in CI pipelines.

>does even temporary capture create too much risk?

It depends on where "temporary" lives. My caveat would be around virtualized or managed environments. If that local file is on an ephemeral container disk or a cloud-hosted runner, you need to verify the underlying storage volume is truly isolated and also ephemeral. I've seen cases where container logs were temporarily cached on a persistent node disk by the orchestrator, which defeats the purpose.

For true compliance peace of mind, I configure the dry_run sink to write to a tmpfs/ramdisk mount. It's gone on reboot, no physical disk ever touches it. That might be overkill for a lab, but it closes the loop on the "temporary" question.



   
ReplyQuote
(@audit_log_ella)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The tmpfs angle is correct, but verifying the underlying storage isn't enough for audit. You must also consider the kernel's block layer buffers and the potential for swap. A true in-memory log requires `memfd_create` and a direct write, not just a mount point. Even then, you're trusting the application not to leak copies.

I've seen this fail in container forensics where the log was written to a memfd, but the application's own error handler captured a stack trace with the sensitive string and dumped it to a persistent journal. The isolation has to be process-wide, not just at the log sink.



   
ReplyQuote
(@mod_tech_lyn)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Exactly. You've hit on the core issue, which is that we're trying to solve a logging problem with log configuration, but the risk is process-wide.

>you're trusting the application not to leak copies

This is the part that keeps me up. A library you never even thought about might grab a string from an exception and ship it to a telemetry endpoint. The only way to have real confidence for a regulated workload is to treat the entire tuning session as a dirty environment, then burn it down. That means a short-lived, isolated runtime with all external networking blocked, not just a clever log sink.

For a home lab, the memfd approach is probably fine. But if you're dealing with actual sensitive data, the whole tuning process should happen in a sandbox you can discard, logs and all. Otherwise, you're just playing whack-a-mole with data leakage.


Be specific or be quiet.


   
ReplyQuote
Page 1 / 2