AI Assistant

Notifications

Clear all

Opinion: Logging 'confidence scores' is a security anti-pattern.

Summarize Topic

Agent Audit Log Design

Last Post by Kat Rivera 5 days ago

3 Posts

3 Users

0 Reactions

3 Views

RSS

Jake Orozco

(@jake_tinker)

Eminent Member

Joined: 1 week ago

Posts: 13

Topic starter

Translate ▼

June 25, 2026 2:57 pm [#918]

We've been talking a lot about what to log—tool calls, decisions, the raw prompts and completions. All good. But I keep seeing a suggestion pop up that makes me nervous: logging the model's "confidence scores" or "logprobs" for security auditing.

I think this is a well-intentioned mistake. Here’s why.

A confidence score from an LLM isn't like a probability from a classic classifier. It doesn't reliably signal "the model is unsure, a human should check this." In fact, a high confidence score on a harmful or illogical output is common. Logging it creates a false sense of security. An analyst might see `"confidence": 0.97` on a log entry and give it a pass, when the action taken was deeply problematic.

What we *should* log is the **evidence for the agent's decision** that we can actually verify. The confidence score is internal model state; we need external, actionable data.

For incident response, my audit logs focus on reconstructing the chain. That means:
* The exact tool/function signature called and with what parameters (scrubbed of clear PII).
* The raw text of the agent's reasoning trace (the chain-of-thought).
* The specific user request or system event that triggered the agent.
* A deterministic identifier for the policy or instruction set the agent was running under.

If you really want a metric for "weirdness," log something based on measurable behavior. For example, a rule violation count, or a flag for when the agent had to retry a tool call multiple times. These are objective facts about the event, not a black-box score.

Storing those confidence values also just adds noise. They're a liability if you're trying to keep logs clean and avoid storing extraneous data. My rule: if you can't define exactly how a field will be used in a post-incident query, don't log it.

I’m running a modified Nemo Claw setup at home, and my audit pipeline to Grafana omits confidence entirely. It’s been much clearer. Anyone else come to the same conclusion?

-- jake

if it compiles, ship it

Quote

Topic Tags

Tomás G.

(@newbie_with_agent)

Active Member

Joined: 1 week ago

Posts: 12

Translate ▼

June 25, 2026 9:06 pm

That's a really good point about the false sense of security. I hadn't thought of it that way.

So when you say to log the evidence for the decision, does that mean the *entire* reasoning trace from the model output? Even if it's long? I'm trying to figure out what's too much noise versus what we actually need to keep.

ReplyQuote

Kat Rivera

(@newb_selfhost_kat)

Eminent Member

Joined: 1 week ago

Posts: 22

Translate ▼

June 25, 2026 10:09 pm

Yeah, the false sense of security angle makes a lot of sense. It reminds me of a weird output I saw once - my agent was super "confident" while generating something completely off-topic.

So, for someone just setting up their first agent, what's a good example of external, actionable data to log instead? Like, if the agent decides to send an email, is logging the recipient (obviously scrubbed) and the email subject enough as "evidence"?

ReplyQuote

80 Forums
1,190 Topics
7,241 Posts
0 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed