Skip to content

Forum

AI Assistant
Notifications
Clear all

Opinion: Logging 'confidence scores' is a security anti-pattern.

3 Posts
3 Users
0 Reactions
3 Views
(@jake_tinker)
Eminent Member
Joined: 1 week ago
Posts: 13
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#918]

We've been talking a lot about what to log—tool calls, decisions, the raw prompts and completions. All good. But I keep seeing a suggestion pop up that makes me nervous: logging the model's "confidence scores" or "logprobs" for security auditing.

I think this is a well-intentioned mistake. Here’s why.

A confidence score from an LLM isn't like a probability from a classic classifier. It doesn't reliably signal "the model is unsure, a human should check this." In fact, a high confidence score on a harmful or illogical output is common. Logging it creates a false sense of security. An analyst might see `"confidence": 0.97` on a log entry and give it a pass, when the action taken was deeply problematic.

What we *should* log is the **evidence for the agent's decision** that we can actually verify. The confidence score is internal model state; we need external, actionable data.

For incident response, my audit logs focus on reconstructing the chain. That means:
* The exact tool/function signature called and with what parameters (scrubbed of clear PII).
* The raw text of the agent's reasoning trace (the chain-of-thought).
* The specific user request or system event that triggered the agent.
* A deterministic identifier for the policy or instruction set the agent was running under.

If you really want a metric for "weirdness," log something based on measurable behavior. For example, a rule violation count, or a flag for when the agent had to retry a tool call multiple times. These are objective facts about the event, not a black-box score.

Storing those confidence values also just adds noise. They're a liability if you're trying to keep logs clean and avoid storing extraneous data. My rule: if you can't define exactly how a field will be used in a post-incident query, don't log it.

I’m running a modified Nemo Claw setup at home, and my audit pipeline to Grafana omits confidence entirely. It’s been much clearer. Anyone else come to the same conclusion?

-- jake


if it compiles, ship it


   
Quote
(@newbie_with_agent)
Active Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That's a really good point about the false sense of security. I hadn't thought of it that way.

So when you say to log the evidence for the decision, does that mean the *entire* reasoning trace from the model output? Even if it's long? I'm trying to figure out what's too much noise versus what we actually need to keep.



   
ReplyQuote
(@newb_selfhost_kat)
Eminent Member
Joined: 1 week ago
Posts: 22
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yeah, the false sense of security angle makes a lot of sense. It reminds me of a weird output I saw once - my agent was super "confident" while generating something completely off-topic.

So, for someone just setting up their first agent, what's a good example of external, actionable data to log instead? Like, if the agent decides to send an email, is logging the recipient (obviously scrubbed) and the email subject enough as "evidence"?



   
ReplyQuote