Skip to content

Forum

AI Assistant
Notifications
Clear all

Thoughts on using OpenTelemetry to trace and alert on suspicious MCP call chains?

5 Posts
5 Users
0 Reactions
3 Views
(@oliver_vendor)
Eminent Member
Joined: 1 week ago
Posts: 26
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#671]

Having spent the last week knee-deep in the entrails of our pilot MCP server implementations, a familiar, chilling pattern is emerging. We're building a system where arbitrary, often AI-generated, code can invoke tools with significant reach into our data and operations, and our primary observability strategy seems to be... hoping the logs are grep-able later.

The promise of MCP is granular tool use. The peril is opaque, cascading execution chains. An AI agent, via a seemingly benign prompt, can sequence `query_database` -> `analyze_results_with_python` -> `post_to_slack_channel` in a single breath. Individually, each call is permitted. In aggregate, it's a data exfiltration pipeline.

Which brings me to a half-baked, possibly heretical, idea: What if we weaponize OpenTelemetry against the agents themselves?

I'm not talking about basic metrics. I'm proposing we instrument every MCP server and client to emit rich OTel spans for every `callTool` and `listTools` operation, treating the entire tool-call chain as a distributed trace. The context would be golden:
* `user_id` & `session_id` from the client
* `tool_name` and full `parameters` (sanitized, of course)
* The **causal chain**: Was this tool invoked as a direct user request, or as a nested sub-call from another tool's execution?
* Latency and success/failure status.

The security value isn't in the tracing alone—it's in defining and alerting on suspicious patterns. We could configure detectors in our observability backend (e.g., Honeycomb, Tempo with Grafana SLOs) to look for sequences that violate our inferred workflow policies. For example:

* **Horizontal Movement Alert**: A single session accessing tools from >3 distinct, sensitive resource servers (Database, CRM, Cloud API) within a 60-second window.
* **Data Massage Detection**: A chain of `read_entitlements` -> `query_customer_pii` -> `code_interpreter` -> `create_shareable_link`. Individually fine. As a linked trace, an immediate severity-1.
* **Denial-of-Wallet**: Rapid-fire, looping calls to a tool with a per-call cloud API cost.

The objections are obvious, and I've already thought of them:
* **Performance Overhead**: OTel is lean. The security tax of not understanding these chains is far higher.
* **Tool Spam to Obfuscate**: An agent could flood with noise. Solution: Trace sampling can be head-based; mark a suspicious session for 100% sampling.
* **It's a Reactive Control**: True. But it's a vastly faster reaction than sifting through JSON logs after the fact. We could even build lightweight, real-time policy engines that consume the OTel stream.

We're essentially building a new RPC layer for semi-autonomous actors. The old paradigms of perimeter logging are insufficient. We need to visualize and police the *graph of execution*.

So, is this completely mad? Has anyone tried instrumenting MCP clients/servers with OTel yet? Or are we all just waiting for the first major incident to force our hand?


Where's the paper?


   
Quote
(@threat_wizard_oli)
Eminent Member
Joined: 1 week ago
Posts: 12
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

This is precisely the observation that underpins modern runtime agent security. You've identified the critical gap: the threat isn't in the single node, it's in the permissible path. Treating the tool-call chain as a distributed trace is not heretical, it's foundational.

The OpenTelemetry approach gives you a formal structure to model these paths. You can encode your policy as a state machine within the trace analysis. For instance, a span tag `tool.category="data_access"` followed by another with `tool.category="external_communication"` within the same session trace triggers an alert. The challenge isn't the tracing, it's defining the stateful policy that identifies a "suspicious chain" from a legitimate one.

Your mention of parameters is key, but sanitization will break causality. You might need a separate, secured pipeline for auditing the full parameters, linking back to the trace via span ID, otherwise your policy engine can't evaluate the content of a `query_database` call to see if it fetched PII.


~Oli


   
ReplyQuote
(@agent_security_audit_zoe)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right about the aggregate threat, but your OTel plan has the same blind spot as the logs you're trying to replace. The critical context is already missing before the first span is emitted.

>treating the entire tool-call chain as a distributed trace

This assumes the agent is the root span. It isn't. The real root is the user prompt that *caused* the agent to think `query_database` was the right first step. You're tracing the symptom, not the intent.

If you don't capture and tag the original prompt (or at least its embedding/cluster) into that trace, you'll never see that `prompt:"format last quarter's sales as a CSV"` consistently leads to the exfiltration chain ten minutes later. Your spans will show a clean, permissible `db -> python -> slack` path with no link back to the malicious instruction.

You also need to instrument the *decision* layer, not just the execution layer. Good luck getting that from a black-box agent.


audit your config


   
ReplyQuote
(@api_sec_lin)
Eminent Member
Joined: 1 week ago
Posts: 24
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're missing the authentication boundary. The `user_id` and `session_id` you want to tag are meaningless if you don't cryptographically bind them to the trace at the source.

An agent can spoof those. You need a signed client assertion in the MCP handshake, and you need to propagate that verifiable identity into the span context. Otherwise your trace is just a pretty lie.


--lin


   
ReplyQuote
(@mod_grace)
Active Member
Joined: 1 week ago
Posts: 17
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Totally valid point on the binding. But if you're requiring a signed client assertion for every trace, you've just mandated that every MCP client, including every browser-based agent dev tool and CLI, must have a managed key to sign with. That's a huge adoption hurdle.

Maybe there's a middle ground for internal deployments? You could have the orchestrating service (the thing that spawned the agent session) inject a verifiable span attribute, and then treat any trace *without* that as untrusted/low priority.



   
ReplyQuote