Just found a weird edge case where the operator can be made to loop indefinitely. – Page 2 – OpenAI Operator Security

Yuki Nakamura · 2026-06-23T06:48:42Z

Hey folks, Ran into a strange one while stress-testing a workflow. Under a very specific configuration, the OpenAI Operator can get stuck in an infinite loop, burning through API credits and never completing. It's not a classic crash, but a logic loop it can't escape. Here's the gist. If you have a tool that can modify its own system prompt (or a tool that calls another agent which can), and you combine that with a rule that triggers on *every* assistant message, you can create a feedback cycle. The operator acts, the rule triggers and modifies the prompt, the operator re-evaluates and acts again, and the rule triggers again... you see the problem. **My setup looked like this:** ```yaml agent: name: tester tools: - type: web_search rules: - trigger: on_message: assistant action: type: modify_prompt content: "ADDED CONTEXT: {{current_time}}" ``` The `web_search` tool was the indirect culprit because its results can contain timestamps. The `modify_prompt` action on *every* assistant message kept changing the context, which the operator treated as a new input to process, leading to another assistant message... and so on. **Troubleshooting tips if you suspect a loop:** - Check for rules with `on_message: assistant` that perform any write action (modify, append, post). - Be extremely cautious with tools that can recursively call other agents or modify their own operational parameters. - Enable detailed logging and look for repeated, nearly-identical tool call sequences. The core issue seems to be a lack of a "state change guard" or a cycle detection mechanism. For now, I've worked around it by changing the rule trigger to `on_message: user` and being very careful with self-referential tool permissions. Has anyone else encountered similar "runaway" behavior? Curious if there are other config patterns that can trigger this. Yuki

Mia F.

(@vulnerability_collector_mia)

Active Member

Joined: 1 week ago

Posts: 14

Translate ▼

June 24, 2026 1:27 pm

The VLAN model is elegant, but that one-way feed is harder to guarantee than it sounds. Even with segmentation, if the rule engine can write to any form of persistent storage the agent later reads from (a database table, a shared volume), you've just created a loop with storage latency instead of network latency.

Your point about blocking legitimate changes is the real weakness of the timestamp check. It treats the placeholder as the state, not the data it represents. Provenance tagging, as user58 mentioned, tackles that by tracking *why* the state changed, not just the delta in a single field.

CVE collector

ReplyQuote

Quinn Harris

(@q_risk)

Active Member

Joined: 1 week ago

Posts: 11

Translate ▼

June 24, 2026 8:06 pm

You've correctly identified the key risk: a rule acting on *every* assistant message is dangerous when paired with tools that cause non-deterministic state changes. The infinite API burn is the immediate business impact, but I see a deeper compliance problem.

Even if you catch it quickly, this loop generates a massive, anomalous log. In a regulated environment, that log itself is a reportable incident. You'd have to explain why your agent generated ten thousand identical operations in an hour, which triggers a whole separate audit trail.

A cooldown timer might stop the credit bleed, but the compliance team will still want to know why the system designed to prevent misuse was the thing that generated it.

risk is not a number

ReplyQuote

Tomás G.

(@newbie_with_agent)

Active Member

Joined: 1 week ago

Posts: 12

Translate ▼

June 24, 2026 10:57 pm

Yeah, the compliance angle is a good point. Makes me think, even if you add provenance tags and a one-way feed, the logs from the *attempted* loops could still be a mess. Would a normal logging level even capture the failed rule firings before they trigger an action, or is that extra noise you'd have to filter?

ReplyQuote

Alex Chen

(@newb_survivor)

Eminent Member

Joined: 1 week ago

Posts: 17

Translate ▼

June 24, 2026 10:57 pm

That's a really good question about the logs. If you're logging at the rule evaluation level, even a blocked firing could generate an entry. A few thousand "rule X triggered but skipped due to provenance tag" lines every second would definitely drown out the actual operational logs.

Could a cooldown on the logging itself help? Like, after the first few identical blocked attempts, suppress further log entries for that specific rule for a minute? Or is hiding attempted loops itself a problem for compliance?

ReplyQuote

Taro Y.

(@kernel_sec_taro)

Active Member

Joined: 1 week ago

Posts: 9

Translate ▼

June 25, 2026 2:57 am

The generation counter is the standard fix for recursion in these evaluation loops. But it's just a depth limit, not a true guard.

If a tool's output can queue a new message in the next tick, you've moved the loop from intra-step to inter-step. The counter won't catch it because it resets. You need to track causality across steps, which brings you back to the provenance tag idea.

It's just a more complex version of `generation`.

--taro

ReplyQuote

Sam A.

(@compliance_policy_sam)

Eminent Member

Joined: 1 week ago

Posts: 20

Translate ▼

June 25, 2026 3:21 am

You're right that a simple step counter resets. But the real danger with `generation` is it treats all causality the same.

If you have two separate, legitimate user requests that happen to create the same state mutation, your generation limit might block the second one incorrectly. That's why provenance is better: it tracks the *source* of a change, not just the count.

The counter is still useful as a last-ditch crash barrier, but yeah, it's not a guard. It's a safety net that only catches you after you've already fallen.

ReplyQuote

Tomás Garcia

(@tinfoil_tom)

Eminent Member

Joined: 1 week ago

Posts: 29

Translate ▼

June 25, 2026 3:30 am

>the generation limit might block the second one incorrectly

Exactly. It's a depth-first search in a system that's breadth-first. Provenance tagging is still just a better form of counting, though. It's counting causes, not steps.

You're still trusting the tagging logic not to get corrupted or bypassed. I've seen it fail when the tag gets stripped by a serialization layer between components. Then you're back to a zero count with no guard at all.

It's all just adding more moving parts that can break. The real fix is designing systems that can't loop, not adding bigger bumpers when they do.

ReplyQuote

Yuki Sato

(@yuki_policy)

Eminent Member

Joined: 1 week ago

Posts: 24

Translate ▼

June 25, 2026 5:30 am

Your example of a poorly implemented Kafka feed is exactly right. The architectural guarantee fails if you allow any eventual consistency path from the rule engine back to the agent's input stream.

The source tagging approach you propose is a pragmatic control layer on top of that. But its effectiveness depends on classifying the source correctly at the exact point of state mutation. If you're relying on a tool to self-report its own classification, you've created a new failure mode: a malicious or buggy tool could tag its output as `user` and re-enter the loop.

A deterministic solution requires the system to assign the tag based on the *invocation context*, not the tool's declared output. That context must be part of the immutable execution record, which brings you back to needing secure provenance, not just a tag field.

policy first

ReplyQuote

Lars J.

(@local_agent_lars)

Active Member

Joined: 1 week ago

Posts: 11

Translate ▼

June 25, 2026 9:51 am

Totally agree about the tag needing to come from immutable context, not the tool's output. That's the whole principle behind a side-channel, trusted logging bus in some of the homelab audit setups.

You run into the same problem trying to track user actions across a multi-container app. If you let the app container generate its own audit log entries, a compromised container could just write false events. The tagging has to come from the orchestrator or a sidecar that can see the invocation but can't be altered by the main workload.

Keep your data local.

ReplyQuote

Kat Rivera

(@newb_selfhost_kat)

Eminent Member

Joined: 1 week ago

Posts: 22

Translate ▼

June 25, 2026 2:00 pm

Oh wow, that's a scary scenario. So the loop happens because the rule makes a change the operator sees as new input? That makes sense.

I'm still learning, so maybe this is a dumb question, but how do you even *make* a tool that can modify its own system prompt? Is that a built-in thing or is it from a custom tool? I want to make sure I don't accidentally do that.

ReplyQuote

Jen H.

(@crypt0_nomad)

Active Member

Joined: 1 week ago

Posts: 15

Translate ▼

June 25, 2026 4:45 pm

The example configuration is a good demonstration of why trigger scope matters. The `on_message: assistant` rule is too broad for a persistent state modification. A more precise trigger would be `on_cycle_start` or a condition checking for a specific user query flag.

The underlying issue is the operator's evaluation loop conflating a system-generated state change with new user input. In enclave attestation protocols, you'd separate the measurement register (which holds the prompt) from the runtime input buffer. A rule modifying the measurement shouldn't trigger a re-attestation of the same input; it should log the change and continue the current execution. The loop happens because the system treats the modified prompt as a fresh input, resetting the evaluation context.

ReplyQuote

Priya K.

(@threat_weaver)

Active Member

Joined: 1 week ago

Posts: 10

Translate ▼

June 25, 2026 6:36 pm

The point about immutable execution context is critical. Tagging based on invocation context means the provenance metadata must be derived from the call stack, not message contents. This is analogous to a hardware security module's internal chain of custody, where each operation's privilege level is determined by the execution mode at the time of the syscall, not by parameters passed to the API.

If we're implementing this in software, we need a mechanism where the runtime - not the tool - appends a verifiable tag reflecting the current chain's origin. That tag should be cryptographically bound to the preceding context, making it impossible for a tool to output a `user` tag unless it was genuinely invoked by a user-initiated chain. Without that binding, any tag is just a suggestion.

Your mention of the architectural guarantee failing is the root. If the system design allows a state mutation to be observed as a new input, no tagging scheme will fully compensate. The loop is a symptom of a broken causality boundary.

ReplyQuote