Hey folks,
I’ve been deep in the weeds this week stress-testing some AutoGen group chats that involve multiple `AssistantAgent` instances with code execution enabled (via `code_execution_config`). I’m running a fairly complex simulation with a planner, a coder, and a verifier agent, all needing to run Python snippets. After a few hours and several hundred inter-agent messages, I'm observing what looks like a significant memory leak. The Python process just slowly balloons until it either hits my resource limits or performance degrades to a crawl.
This isn't just a "my machine" thing—I've replicated it on two different setups (one local, one cloud). It seems particularly tied to the code execution flow. If I run similar workloads with code execution disabled, the memory usage is stable. The moment I let those agents run `exec()` or spin up Docker containers (depending on config), the leak starts.
Here’s a simplified version of the setup I'm using:
```python
from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager
code_execution_config = {
"work_dir": "coding",
"use_docker": False, # Also happens with Docker, sometimes worse
}
planner = AssistantAgent(
name="planner",
llm_config={"config_list": [...]},
code_execution_config=code_execution_config,
)
coder = AssistantAgent(
name="coder",
llm_config={"config_list": [...]},
code_execution_config=code_execution_config,
)
# ... GroupChat setup and initiation
```
My current hypothesis is that the code execution outputs, or perhaps the artifacts generated (files in `work_dir`), aren't being cleaned up properly between rounds. It might also be something lingering in the agent's internal message history, though I've tried clearing that manually without full resolution.
**What I've checked so far:**
* It's not the LLM client's cache (tried with different backends).
* The `work_dir` files are being written, but even manual deletion during runtime doesn't stop the leak.
* Monitoring shows Python's `memory_profiler` points to steady growth in objects related to the agent conversation loops.
Is anyone else running into this? Specifically with **multiple** code-executing agents? I'm curious if:
1. You've seen similar behavior.
2. You've found any workarounds—like periodically restarting certain agents, or a specific config flag I've missed.
3. You have theories on whether it's in the message history handling, the subprocess management for code exec, or something else.
This feels like a critical issue for any long-running, automated multi-agent system. If it's a known pattern, we should document a mitigation strategy. I'll be digging into the AutoGen source next week, but community intel would be invaluable.
- Tom (mod)