If you're running AutoGen agents with code execution, you're already in a high-risk environment. The default logging is insufficient for forensics. You need a complete, immutable record of every action: who did what, when, and with what result. This isn't just for debugging; it's for security audits and non-repudiation.
Here's a step-by-step setup using callback handlers and structured logging. This assumes you're using `GroupChat` and `AssistantAgent`.
**Core Principle:** Intercept all agent interactions and code executions, then write them to a secure, append-only log (like a system journal or a remote vault audit log).
**Step 1: Implement a custom callback handler.**
Create a handler that captures the key events: `on_chat_init`, `on_chain_start`, `on_chain_end`, and critically, `on_code_execution`.
```python
import json
import logging
from datetime import datetime
from autogen import AssistantAgent, GroupChat, GroupChatManager
from typing import Dict, Any
class ForensicCallbackHandler:
def __init__(self, agent_name: str):
self.agent_name = agent_name
# Configure a dedicated logger. In production, hook this to syslog or a remote sink.
self.logger = logging.getLogger(f"forensics.{agent_name}")
self.logger.setLevel(logging.INFO)
handler = logging.FileHandler(f'/var/log/autogen_forensics_{agent_name}.log')
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(message)s'))
self.logger.addHandler(handler)
def log_event(self, event_type: str, data: Dict[str, Any]):
"""Structured log entry."""
entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"agent": self.agent_name,
"event": event_type,
"data": data
}
self.logger.info(json.dumps(entry))
def on_code_execution(self, code: str, result: str):
"""This is the most critical hook."""
self.log_event("code_execution", {
"code_snippet": code, # Be careful with secrets in code. Consider hashing.
"execution_result": result
})
def on_chain_start(self, serialized: Dict[str, Any], inputs: Dict[str, Any]):
self.log_event("chain_start", {"inputs": inputs})
def on_chain_end(self, outputs: Dict[str, Any]):
self.log_event("chain_end", {"outputs": outputs})
```
**Step 2: Attach handlers to your agents.**
Instantiate the handler for each agent and pass it into the agent constructor.
```python
# Create handlers
coder_forensics = ForensicCallbackHandler(agent_name="SeniorCoder")
critic_forensics = ForensicCallbackHandler(agent_name="CodeCritic")
# Configure agents with the callbacks
coder = AssistantAgent(
name="SeniorCoder",
llm_config={...},
system_message="You are a coder.",
code_execution_config={"use_docker": False}, # Or True with proper isolation
callbacks=[coder_forensics] # Attach the forensic logger
)
critic = AssistantAgent(
name="CodeCritic",
llm_config={...},
system_message="You review code.",
callbacks=[critic_forensics]
)
```
**Step 3: Enforce and verify log integrity.**
The file log is a start, but it's mutable. For real forensics:
* **Stream logs to a secure system:** Use the `logging.handlers.SysLogHandler` or a library to ship logs to a centralized SIEM or a HashiCorp Vault audit log.
* **Include context:** Ensure each log entry includes the session ID, user principal (from your OIDC integration), and a hash of the previous log entry for chain-of-custody.
* **Log ALL inputs:** This includes the initial user request and any retrieved context from a RAG system.
**Key considerations:**
* This adds overhead. Test performance impact.
* The log will contain sensitive data. It must be encrypted at rest and access-controlled using a zero-trust model.
* This logs the *action* and *result*, but not the agent's internal reasoning. For that, you'd need to hook deeper into the LLM prompts/responses, which is more complex.
* Without a verifiable chain of custody, logs are just files, not evidence. Integrate with a proper audit trail system.
Secrets? Not on my disk.