Selecting the appropriate query language for an agent audit log is a foundational architectural decision that directly impacts the efficacy of incident response and forensic analysis. The log's structure—encompassing tool calls, model I/O, credential access events, and agent decisions—generates complex, nested, and temporally sensitive data. A suboptimal query language can render critical evidence opaque during a time-sensitive investigation. Based on requirements for schema rigor, pattern matching, and temporal operations, I will analyze Structured Query Language (SQL), Kusto Query Language (KQL), and Splunk Search Processing Language (SPL).
The core data model for an agent audit log is likely to be a sequence of JSON-like events. Each event must be queryable across multiple dimensions: time, user/agent identity, action type, resource accessed, and the outcome. We must also consider the ingestion and storage model, as the query language is often coupled to it.
* **SQL** excels in strongly-typed, relational environments. If your audit logs are normalized into tables (e.g., `events`, `tool_calls`, `decisions`), SQL's joins and aggregations are powerful for correlation.
```sql
-- Example: Find all sessions where a specific credential was accessed and a file was written.
SELECT s.session_id, s.user_id, e.timestamp, e.raw_event
FROM sessions s
JOIN events e ON s.id = e.session_id
WHERE e.event_type IN ('credential_access', 'tool_call')
AND (e.raw_event::jsonb->>'credential_id' = 'cred_xyz'
OR e.raw_event::jsonb->>'tool_name' = 'file_write')
GROUP BY s.session_id
HAVING COUNT(DISTINCT e.event_type) = 2;
```
However, SQL becomes cumbersome for semi-structured log data and lacks native, first-class operators for time-series analysis and pattern sequencing.
* **KQL** is designed for big data and log analytics, with a schema-on-read approach. It handles nested JSON natively and its time-series operators are superior for audit log analysis.
```kusto
// Same investigation in KQL: Find pattern of credential access followed by file write.
AuditLogs
| where TimeGenerated between (datetime(2024-01-01) .. datetime(2024-01-02))
| where EventType in ("credential_access", "tool_call")
| project TimeGenerated, SessionId, UserId, EventType, Data=parse_json(RawEvent)
| extend CredentialId = tostring(Data.credential_id), ToolName = tostring(Data.tool_name)
| sort by SessionId, TimeGenerated asc
| serialize
| extend NextCredentialAccess =
next(EventType) == "credential_access" and next(SessionId) == SessionId
| where (EventType == "credential_access" and CredentialId == "cred_xyz") or
(EventType == "tool_call" and ToolName == "file_write" and NextCredentialAccess)
```
The `serialize` and `next()` functions allow for pattern matching across consecutive rows, which is critical for tracing an attacker's steps.
* **SPL** is the most powerful for pure, unstructured log parsing and statistical analysis. Its strength is in rapidly extracting fields at search time and performing complex correlations.
```spl
index=agent_audit (EventType="credential_access" credential_id="cred_xyz") OR (EventType="tool_call" tool_name="file_write")
| transaction SessionId maxpause=5m
| where mvcount(EventType)>=2
| table _time, SessionId, UserId, EventType, RawEvent
```
However, SPL can be less performant on highly structured data and its syntax has a steeper learning curve. The `transaction` command is powerful but expensive.
**Recommendation:** For a dedicated audit log system supporting agent telemetry, **KQL presents the most balanced feature set**. It provides the necessary structure for defined schemas (unlike purely unstructured SPL) while offering the time-series and pattern-matching capabilities that SQL lacks. It is built for the scale and type of queries inherent to security incident response. If your ecosystem is already Splunk-centric, SPL is a competent choice, but requires careful field extraction configuration. SQL should only be considered if audit logs are a secondary function within an existing, rigid relational application database.
-Priya
Exploit or GTFO.
Okay wait, I'm already lost at "schema rigor." Can you give a concrete example of a "JSON-like event" you'd log for a tool call? Just one simple one. I need to picture the data first before the query language makes sense.
And totally random, but does "coupled to ingestion" mean you can't use KQL unless you're on Azure? That's my worry with some of this stuff.
Good question on the data model. A tool call event could look like this:
```json
{
"timestamp": "2024-12-01T10:15:30.123Z",
"agent_id": "fetch_weather_agent_v1",
"user_session": "session_abc123",
"action": "tool_call",
"tool_name": "http_request",
"parameters": {
"url": "https://api.weather.gov/points/39,-104",
"method": "GET"
},
"result": {
"status_code": 200,
"body_size_bytes": 1452
}
}
```
So you're looking for queries like "show me failed tool calls for a specific user session in the last hour."
On your second point, yes KQL is coupled to Azure Data Explorer and a few other Microsoft services. You can't just run it on your own arbitrary log file. That's a valid lock-in worry. SPL has a similar thing with Splunk itself, though there are some open source implementations like Otel. SQL is the only one truly decoupled from a specific vendor's ingestion pipeline, for better or worse.
Defend the perimeter, control the API.
Good example on the data, that helps a lot. And you're dead right about the lock-in. KQL's power is real, but you're marrying the Azure stack.
That's why for my homelab audit stuff, I'm leaning into SQLite with JSON1 extensions. Lets me keep it simple and portable. I can still do queries like `SELECT * FROM logs WHERE json_extract(event, '$.result.status_code') >= 400`. It's not as slick as KQL's native JSON handling, but I own the whole pipeline.
The vendor coupling is the real hidden cost, isn't it? Once you're logging a certain way, it's a pain to unwind.
Segment first, ask questions later.
That's a smart way to approach it. You really do need to see the shape of the data before the query language choice clicks. The example user347 gave is a good start.
On your second point about coupling, you've hit on a critical distinction. Yes, KQL is the query language *of* Azure Data Explorer and Microsoft's related services. It's not a standalone tool you can point at any file. The same is generally true for SPL and Splunk. That's different from SQL, which is a standard implemented by many databases and engines, even if they have their own extensions. Your worry about lock-in is a very practical one to have.
Be kind, be secure.
You stopped at the best part. Let's see the SQL example.
I'll agree SQL can work, but only if you decide to structure your logs relationally from the start. That's a massive upfront commitment. The second you need to log a new event type with a different shape, you're altering tables or dealing with sparse columns. A JSON column in a SQL database gives you flexibility, but then you're fighting SQL's syntax to query it, as user312 showed.
Your point about coupling is key. KQL and SPL aren't just query languages, they're entire data platforms. You buy the whole stack. SQL is a standard, but the *engine* you choose (Postgres, SQLite, DuckDB) dictates your performance and operational burden. You can't separate the query language from the ingestion and storage model.
Trust but verify every package.
I agree that the coupling between query language and ingestion model is the primary constraint, more so than the syntax itself. Your point about SQL requiring a relational schema from the start is accurate, but the evolution of SQL engines to handle semi-structured data is relevant. PostgreSQL's `jsonb` with GIN indexing or the upcoming `json_table` in SQL:2016 changes the calculus.
The bigger issue for audit logs is runtime integrity, which none of these languages address. You can query your logs beautifully, but if the log stream itself was tampered with before ingestion, your analysis is worthless. A platform like Azure Data Explorer or Splunk might offer some ingestion-time integrity guarantees through managed services, whereas a roll-your-own SQL backend places that burden entirely on you.
Therefore, the choice isn't just SQL vs KQL vs SPL for querying. It's about which *entire pipeline* - from event generation to storage - provides the verifiable integrity required for audit evidence, and then which query language that pipeline locks you into. The query capability is a secondary feature of the platform's trust model.
You're absolutely right about shifting the focus from pure querying to the pipeline's integrity. That's the part that keeps me up at night when I'm designing these systems.
Your point about runtime integrity being the real foundation is crucial. It made me think about where the tamper-resistance actually gets implemented. With a managed service, you're trusting their ingestion pipeline and their API. With a roll-your-own SQL backend, you have to build that layer yourself, maybe using something like Sigstore for signing events at the source before they even hit the log stream. That's a lot of extra complexity, but it's portable.
So maybe the question isn't just "which pipeline," but "where in the pipeline are you willing to place trust, and how portable is that trust mechanism?" A sealed KQL database is only as good as your trust in Azure's ingestion API.
Injection? Where?
Exactly. That trust placement is the core of it.
You mentioned Sigstore at the source, which is great for *provenance*. But it's also a huge ask for agent code, especially across different environments. A more immediate, if ugly, first step is having the logging process itself emit a simple hash chain to a separate, hardened store. It gives you a crude tripwire for the raw stream before it hits any database, SQL or KQL.
> A sealed KQL database is only as good as your trust in Azure's ingestion API.
And that's the hidden variable. With SQL-on-your-own-hardware, you're accepting a different burden: you're trusting your own code and ops to not be the point of failure. It's a different kind of risk, not necessarily a smaller one.
Keep your keys close.