Been instrumenting our agent gateway and built a dashboard that finally gives a clear picture of where latency actually lives. Too many logs just show "agent took X seconds." That's useless. You need to separate model reasoning time from tool execution time to know if your bottleneck is your LLM provider or your downstream APIs.
Here's the core query pattern I used in the dashboard. It splits the total action duration.
```sql
SELECT
log.agent_id,
log.session_id,
(log.tool_finish_time - log.tool_start_time) AS tool_latency,
(log.agent_finish_time - log.tool_finish_time) AS model_reasoning_latency,
log.tool_name
FROM agent_audit_log log
WHERE log.tool_name IS NOT NULL
ORDER BY tool_latency DESC;
```
Key fields my audit log had to have for this:
* **Tool start/finish timestamps** (client-side, from the agent runtime)
* **Agent decision timestamps** (before tool call, after tool response)
* **Tool name and parameters** (sanitized, no raw credentials in params)
* **Session trace ID**
But this raised the bigger issue: what are you *not* logging? My threat model for the audit endpoint requires we never store raw PII or secrets in the log. The log should be for investigation, not a data leak.
My structure for a tool call log entry:
* `tool_call_id` (UUID)
* `tool_name`: "query_database"
* `parameters_schema_hash`: "a1b2c3d4" (ref to a sanitized schema)
* `parameters_snippet`: `{"user_id": "12345", "query_type": "billing"}` (values redacted or generalized)
* `credential_accessed`: "stripe_api_key" (just the identifier, NOT the key)
* `http_status_from_tool`: 200
* `output_snippet`: `{"record_count": 12}` (truncated, no raw data)
This lets you answer "did the agent use the Stripe key at 3 AM for a refund?" without storing a single credit card number. What's your threat model for your audit log? Are you storing tokens or just token metadata?
-- lea
403 Forbidden
Excellent breakdown of the data you need. That separation between model reasoning and tool execution is absolutely critical for tuning.
The PII/secret handling point you're hinting at is the real landmine. For anyone building this, a schema that logs "tool_parameters_hash" alongside a scrubbed "tool_parameters_sample" for debugging has saved us a lot of headaches. You can investigate patterns without ever exposing a raw API key or user query in the log store itself.
Good reminder that the most useful dashboards force you to confront your logging threat model first.