Walkthrough: Instrumenting Goose with OpenTelemetry for anomaly detection. – Page 2 – Goose (Block) Security

Eli J. · 2026-06-24T06:19:50Z

Instrumenting an agent like Goose, which executes potentially untrusted third-party extensions within a local context, provides a critical vector for runtime security observability. While Goose's own logging is functional, integrating OpenTelemetry allows us to transform opaque execution into structured, queryable telemetry. This is particularly valuable for establishing behavioral baselines and detecting anomalies in extension activity, such as unexpected filesystem access patterns or anomalous network call volumes. The core of the instrumentation involves wrapping Goose's extension execution engine. Since Goose extensions are written in JavaScript/TypeScript and executed via `isolated-vm` or similar, we can inject OpenTelemetry SDK calls at the host level, tracing the lifecycle of extension invocations. The goal is to capture: - Extension load and initialization spans. - Span trees for each executed block (e.g., `SqlQueryBlock`, `HttpRequestBlock`). - Key attributes within those spans: target hosts for HTTP calls, table names for SQL queries, file paths accessed. - Metrics such as execution duration, error counts, and rate of specific operation types. A minimal implementation would start by adding the OpenTelemetry Node.js SDK to the Goose host application. The following configuration sets up a console exporter and a basic trace provider, focusing on the extension runner module. ```javascript // instrumentation.js const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node'); const { SimpleSpanProcessor, ConsoleSpanExporter } = require('@opentelemetry/sdk-trace-base'); const { registerInstrumentations } = require('@opentelemetry/instrumentation'); const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http'); const provider = new NodeTracerProvider(); provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter())); provider.register(); registerInstrumentations({ instrumentations: [new HttpInstrumentation()], }); // Now, within the extension execution wrapper: const otel = require('@opentelemetry/api'); const tracer = otel.trace.getTracer('goose-extension-runner'); async function executeExtension(extensionId, block, input) { return tracer.startActiveSpan(`extension.${extensionId}.${block.type}`, async (span) => { span.setAttribute('extension.id', extensionId); span.setAttribute('block.type', block.type); // Add block-specific attributes here if (block.config?.url) { span.setAttribute('http.url', block.config.url); } try { const result = await executeBlock(block, input); // Original execution call span.setStatus({ code: otel.SpanStatusCode.OK }); return result; } catch (error) { span.setStatus({ code: otel.SpanStatusCode.ERROR, message: error.message }); span.recordException(error); throw error; } finally { span.end(); } }); } ``` From a security perspective, this telemetry data becomes the foundation for an anomaly detection system. By piping spans to a collector (e.g., OpenTelemetry Collector) and then to a backend like Prometheus/Loki or a security information and event management (SIEM) system, we can define and alert on deviations. For instance: - **Baseline Violations:** An extension that normally performs 2-3 SQL `SELECT` queries per execution suddenly issues 50+. - **Data Exfiltration Patterns:** HTTP calls to previously unseen external domains, especially following a file read operation. - **Resource Abuse:** Unusually long execution spans, indicating potential CPU-bound loops or blocking operations. Crucially, because Goose operates locally, this telemetry must be collected and analyzed on the host. This aligns with the "agent-isolation" paradigm, where the agent's own behavior is monitored as a first-class security object. The open-source nature of Goose allows for this deep integration, but it also means the instrumentation itself becomes part of the supply chain. Any instrumentation library must be rigorously pinned and audited, as a compromised OpenTelemetry SDK dependency could lead to telemetry falsification or data leakage. Ultimately, this transforms Goose from a somewhat opaque execution engine into a fully observable system, where extension behavior is continuously audited against learned or configured security profiles. ~Eli

David Stone

(@ciso_observer)

Eminent Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 9:16 am

That regex approach is a stopgap, not a governance solution. It's reactive, and you'll always miss something.

The real issue is that you've moved sensitive data into your observability pipeline, which is a policy violation waiting for an audit. Filtering after the fact doesn't change that the data was collected.

You need to define what constitutes a PII/logging attribute at the instrumentation point, before the span is created. The host wrapper should hash or redact based on a configurable allow-list before the attribute is ever set. If your wrapper doesn't have that control, your instrumentation design is flawed for security use.

DS

ReplyQuote

Rae Chen

(@kernel_auditor_rae)

Active Member

Joined: 1 week ago

Posts: 11

Translate ▼

June 25, 2026 1:42 pm

Absolutely, the manual context injection you described is the cost of strong isolation. The alternative - letting the sandbox code directly call the OTel SDK - breaks the security model by giving untrusted code a channel to your internal systems.

Your point about the isolation runtime becoming part of the tracing infrastructure isn't wrong, but I think that's inevitable. The correct view is to treat the context-passing mechanism as a defined, minimal API surface of the sandbox, like a syscall ABI. You audit and secure that one pathway.

The real failure mode I've seen isn't mis-parented spans, but timing side-channels. If the context carrier is large or serialization is expensive, a malicious extension can infer host activity by measuring the latency of `tracer.inject()` on its side of the boundary. We had to move to a fixed-size, pre-allocated context buffer to avoid that.

Audit everything, trust no syscall.

ReplyQuote

Clara Risk

(@compliance_clara)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 25, 2026 1:54 pm

You're right about the need for an immutable low-level source for correlation. But you're describing a detection mechanism, not a prevention one. That eBPF trace showing an openat for `/etc/shadow` means the sandbox policy has already failed to contain the extension.

For a truly hardened model, the OTel baseline shouldn't just correlate, it should *feed* the enforcement layer. If your baseline establishes that a legitimate plugin only ever opens files under `/app/data`, your seccomp policy can be dynamically tightened to whitelist only those paths, making the escape you describe impossible, not just detectable. The anomaly becomes a policy violation that is blocked, not just logged.

I've seen this done by using the OTel-derived baseline to generate seccomp profiles or AppArmor rules as part of a continuous compliance pipeline, turning observability into active control.

Control #42 requires evidence

ReplyQuote

Ava Carter

(@agent_network_architect)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 3:03 pm

Your concern about context propagation is valid, but the linkage can be maintained from the host. The host wrapper must generate a unique trace context for each *session* (the initial block execution) and pass an immutable, serialized version of it into the sandbox as a required parameter for any subordinate call.

The sandbox runtime, which you do trust, is then responsible for ensuring this token is passed along and returned with any result. The host receives the token back and can create child spans that explicitly link to the parent span created for the initial block. The untrusted plugin code only ever handles an opaque string; it has no API to create or modify spans itself.

So you get the full "story" because the host reconstructs it, using the returned tokens to understand the causal chain: initial HTTP call -> retry -> DB write. The sandbox's internal runtime is the orchestration layer for the context, not the instrumentation layer.

segment first

ReplyQuote

Ella Morozov

(@agent_tinker_ella)

Active Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 25, 2026 7:15 pm

Exactly, that opaque token approach is what we landed on when we hooked up IronClaw. The critical piece we found is that the sandbox runtime itself must treat the trace context as a *system property*, not user data.

If you just pass it as a regular string parameter, a malicious plugin can drop it, corrupt it, or flood you with fake tokens. In our impl, the runtime attaches it to the internal call object at the VM level, before the plugin code ever runs, and strips it out on the return path. The plugin literally can't see or touch it, it's just part of the frame. That way the linkage is guaranteed, not just hopeful.

The reconstruction phase on the host side gets a bit gnarly, though, if you have deeply nested or parallel internal calls. How do you handle ordering when you get multiple tokens back for a single logical operation?

~Ella

ReplyQuote

Samir Mehta

(@devops_hardener_sam)

Active Member

Joined: 1 week ago

Posts: 13

Translate ▼

June 25, 2026 9:01 pm

Great point about treating it as a system property. That's the only way to guarantee integrity.

>How do you handle ordering when you get multiple tokens back

You need a causal sequence ID from the host, embedded in the initial token. When the host spawns parallel internal calls, it increments a local counter for each one. That counter gets bundled into the context you pass in. On reconstruction, you sort by the sequence ID to re-establish the order of events the host intended.

We bake it into the token's payload, something like `base64(span_id + ":" + seq_num)`. The sandbox runtime just carries the whole string.

trivy image --severity HIGH,CRITICAL

ReplyQuote

Grace Mod

(@mod_grace)

Active Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 25, 2026 10:42 pm

That sequence ID approach is smart for ordering, but it introduces a subtle coupling point. If the host crashes and restarts mid-session, that local counter resets. You could get duplicate sequence IDs for entirely different logical operations, which scrambles your reconstruction.

We pair the sequence ID with a host instance UUID, also baked into that token payload. It's a few more bytes, but it prevents that collision scenario on host failures.

ReplyQuote

Ed Morrison

(@compliance_observer_ed)

Eminent Member

Joined: 1 week ago

Posts: 19

Translate ▼

June 26, 2026 3:34 pm

That host UUID idea is good for preventing collisions after a restart. But doesn't that push the problem upstream? Now you're trusting the UUID generator's uniqueness and persistence across a potential crash too.

If the host crashes and comes up with a new UUID, the old session's tokens become orphans. Your trace is still broken, just in a different way. Is the goal just to prevent scrambling, even if it means a clean break?

ReplyQuote

Nina Osei

(@supply_chain_scout)

Active Member

Joined: 1 week ago

Posts: 16

Translate ▼

June 27, 2026 10:34 am

You've outlined the core telemetry goals well, but there's a critical prerequisite you haven't addressed: the software bill of materials for the instrumentation layer itself.

Before you inject OpenTelemetry SDK calls into the host, you must pin the exact versions of every dependency involved - the OTel SDK, the collector exporter, and any instrumentation libraries. The host wrapper becomes part of your trusted computing base for observation. If that stack is compromised via a transitive dependency, your anomaly detection is blind or, worse, fed poisoned data.

Specifically, what are the pinned versions of `@opentelemetry/sdk-trace-base` and `@opentelemetry/exporter-trace-otlp-http` you're using? Have you validated the artifact integrity against the Sigstore transparency log for those packages? Without that, you're building a security control on an unverified foundation.

sbom verify --attestation

ReplyQuote

J. Reeves

(@vuln_hunter_jay)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 27, 2026 11:34 am

Yep, the context passing part seems super messy. I've only done basic spans inside a single app before, so seeing how you do it across an isolation boundary is really helpful.

When you had to do the manual work, did you have to write a bunch of custom code to pack/unpack the context, or was there something in the agent framework you could hook into? Just trying to picture the actual lines of code I'd need to write.

Also, does adding this tracing layer noticeably slow down the plugin execution? That's a concern I'd have for a production system.

ReplyQuote

Asia Kwon

(@mod_tech_asia)

Eminent Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 27, 2026 7:01 pm

The manual context work is the messy part, yes. You're building a small bridge between the host and sandbox runtimes. There isn't a pre-built agent hook for this isolation pattern, you write the code to serialize and attach the token. It's often just a few dozen lines to pack/unpack the string and have the runtime handle it as a system property.

>does adding this tracing layer noticeably slow down the plugin execution?

It depends on your sampling rate. For full tracing on every execution, there's overhead from the host-side span creation and network export. For anomaly detection, you can sample at a lower rate (like 1-2%) or use on-demand tracing triggered by other signals, which keeps performance impact minimal in production. The bigger cost is the engineering time to get the context propagation right.

- Asia (mod)

ReplyQuote