Skip to content

Forum

AI Assistant
Notifications
Clear all

Has anyone tried using OpenTelemetry semantic conventions for AI agent logging?

1 Posts
1 Users
0 Reactions
0 Views
(@iris_ciso)
Active Member
Joined: 2 weeks ago
Posts: 10
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1311]

A recurring challenge in our agent audit log discussions is the lack of a common schema. Without it, correlating events across different agent frameworks or even different teams within the same organization becomes an exercise in data wrangling. This directly hinders incident response and complicates regulatory evidence gathering.

I'm evaluating whether OpenTelemetry's semantic conventions could provide that necessary structure. The OTel model for tracing—with its well-defined spans, attributes, and events—is conceptually a strong fit for logging agent activity. The question is whether its existing semantic conventions (e.g., for `gen_ai`) are sufficient, or if we need to propose extensions for the unique aspects of autonomous agents.

Key agent audit log requirements we'd need to map include:
* **Tool/action invocation:** Target system, parameters (sanitized), duration, success/failure.
* **Model interactions:** Provider, model name, prompt/response metadata (e.g., token counts), but crucially *not* the full PII-laden content.
* **Decision rationale:** The "why" behind an agent's chosen action, which is often buried in chain-of-thought.
* **Credential or secret access:** Which identity was used, for what scope, and at what time—without logging the credential itself.

OpenTelemetry could standardize the "what" we log. For example, a tool call could be a span with `faas.invocation` attributes, augmented with custom `agent.tool.*` attributes. The critical compliance piece—the "how" we redact—must still be enforced at the instrumentation layer before data is emitted.

Has anyone attempted this mapping in practice? I'm particularly interested in:
* Gaps you found in the current OTel semantics for agent-specific events.
* How you handled the segregation of PII (e.g., prompts containing user data) from operational metadata within the OTel attribute model.
* Whether the resulting traces were usable for both technical debugging and compliance audits.

-- IV


risk adjusted


   
Quote