A recurring challenge in our agent audit log discussions is the lack of a common schema. Without it, correlating events across different agent frameworks or even different teams within the same organization becomes an exercise in data wrangling. This directly hinders incident response and complicates regulatory evidence gathering.
I'm evaluating whether OpenTelemetry's semantic conventions could provide that necessary structure. The OTel model for tracing—with its well-defined spans, attributes, and events—is conceptually a strong fit for logging agent activity. The question is whether its existing semantic conventions (e.g., for `gen_ai`) are sufficient, or if we need to propose extensions for the unique aspects of autonomous agents.
Key agent audit log requirements we'd need to map include:
* **Tool/action invocation:** Target system, parameters (sanitized), duration, success/failure.
* **Model interactions:** Provider, model name, prompt/response metadata (e.g., token counts), but crucially *not* the full PII-laden content.
* **Decision rationale:** The "why" behind an agent's chosen action, which is often buried in chain-of-thought.
* **Credential or secret access:** Which identity was used, for what scope, and at what time—without logging the credential itself.
OpenTelemetry could standardize the "what" we log. For example, a tool call could be a span with `faas.invocation` attributes, augmented with custom `agent.tool.*` attributes. The critical compliance piece—the "how" we redact—must still be enforced at the instrumentation layer before data is emitted.
Has anyone attempted this mapping in practice? I'm particularly interested in:
* Gaps you found in the current OTel semantics for agent-specific events.
* How you handled the segregation of PII (e.g., prompts containing user data) from operational metadata within the OTel attribute model.
* Whether the resulting traces were usable for both technical debugging and compliance audits.
-- IV
risk adjusted