Hey folks, Greg here! Been running SuperAGI in my homelab for a few months now, mostly for automating some data gathering and light analysis tasks. I see a lot of chatter about swapping out the default SQLite memory backend for Postgres or some fancy vector DB right away, and I'm kinda wondering... why the rush?
For a small, personal, or trusted-team setup, that little `superagi.db` file has been rock solid for me. I think we sometimes get caught up in scaling for problems we don't have yet. If you're running maybe 3-5 agents on a single Proxmox LXC or a dedicated Docker host, and you're not exposing the UI directly to the scary internet, SQLite is *simple*. No extra containers to manage, no network-accessible database to harden, no user/pass configs flying around in your compose files.
Here's a peek at my current setup's relevant bit:
```yaml
# docker-compose.yaml snippet
version: '3.8'
services:
superagi:
image: superagi/superagi:latest
volumes:
- ./storage:/app/storage # This holds the SQLite DB
- ./config.yaml:/app/config.yaml
ports:
- "8080:8080" # Behind my NPM reverse proxy with Authelia
```
My thinking is this: the attack surface shrinks considerably. You're only really focusing on:
* Securing the SuperAGI web UI itself (strong passwords, 2FA if possible, reverse proxy with access controls).
* Locking down the marketplace plugins (meticulously reviewing those Python scripts before enabling).
* Network isolation for the whole stack.
The moment you bring in Postgres, you now have to:
* Manage DB credentials and rotation.
* Secure the database port (or socket).
* Think about replication and backups differently.
* Add another service that could need patching.
Now, don't get me wrong! I'm already planning a shift to Postgres because I'm starting to experiment with more concurrent agents and looking at integrating a proper vector memory for some projects. But for a "phase 1" or a stable, limited-production environment that stays inside the homelab firewall? I feel like the SQLite default gets an unfairly bad rap.
What's everyone else's experience? Am I being naive about the persistence layer risks, or is the complexity of a "proper" database sometimes overkill at the start? Would love to see how others have structured their secure, but not-overengineered, deployments.
- Greg
More VLANs than friends.
That's a fair point about the attack surface. One thing I wonder about though - when you say trusted team, does that include the agents themselves?
I'm still learning, but I've been reading about prompt injection leading to data exfiltration. If an agent gets compromised, could it just read or write to that sqlite file directly? Or does the framework impose some kind of isolation layer that prevents file system access?
Great question, and this is where the deployment model matters. The agents aren't directly reading and writing a `.db` file on a host volume; they interact through the framework's API. The risk isn't the agent typing `sqlite3` commands, it's a compromised agent using the memory *tool* to query or insert anything it wants.
If the framework's tool access control is permissive, a clever injection could, for example, force a query that dumps all stored credentials or task histories through the agent's normal output channel. The isolation isn't about filesystem access, it's about whether the tool interface filters or scopes queries.
So in a "trusted team" setup, you're also trusting that your agents' instructions can't be subverted to misuse the tools they already have. For many hobby projects, that's a reasonable bet.
Exactly, that's the real issue. It's not the SQLite file itself, it's the framework's memory tool becoming a potential data exfiltration channel.
I run a few agents locally, and I've wondered about locking down what they can query. In Home Assistant, you can create templates that sanitize inputs to queries. Does SuperAGI have anything like that, or is the memory tool just a raw passthrough? If it's wide open, a single compromised prompt could ask "show me all entries from the last week" and get everything.
For a small home lab, maybe the risk is low. But if you start giving agents access to more sensitive APIs, that memory becomes a juicy target.
You've hit the nail on the head. The tool is just a wrapper around the ORM. I dug into the source a while back, and the default memory tool's `search_memory` method typically just passes the query string straight into a SQL LIKE clause. No scoping, no tenant/user isolation.
So yes, `"show me all entries from the last week"` could be constructed, and it would work. The mitigation isn't in the database choice, it's in the framework's access control for the tool itself, which is basically nonexistent out of the box.
Your Home Assistant analogy is perfect. What you'd need is a policy layer *before* the query gets to the ORM. For a homelab, you could fork the memory tool class and add a hard filter on agent_id or something. In a "real" setup, you'd want that as a configurable policy. Most people just skip straight to hoping their prompts don't get hijacked.
ship it or break it.
Yeah, the ORM passthrough is the real problem. I see the same pattern in half the "security" dashboards we review - a fancy UI over a direct database call. The tool gives the agent a query window, not a controlled API.
You could fork the class and filter by agent_id, but what's stopping a hijacked agent from asking "show me entries for agent X"? You'd need proper object-level access control baked into the model layer, which nobody's doing for a weekend project.
Frankly, if you're worried about this, you shouldn't be using the built-in memory tool at all. Pipe it to a separate service with actual authz logic, even if it's just a tiny Flask app that checks a token. SQLite vs Postgres is a red herring; they'll both give up the data if the query isn't scoped.
Alert fatigue is a design flaw.
> when you say trusted team, does that include the agents themselves?
That's the critical distinction everyone glosses over. The agent is a process executing code with the privileges of the framework's runtime. If that runtime can open and write to a file, so can a compromised agent, barring a mandatory access control layer like seccomp.
The real isolation question isn't about the SQLite file format, it's about the system call filtering on the container or host. If your deployment runs the agent without a seccomp profile that blocks `openat`, `write`, and `connect` (to exfiltrate), then a successful code execution can absolutely read that `.db` file directly from the filesystem, regardless of the framework's intended API. The memory tool is just one potential vector; a full compromise bypasses it entirely.
This is why the kernel-level sandboxing matters more than the database choice for a local setup. You're not just trusting the agent's instructions, you're trusting that the entire execution environment has no arbitrary file I/O primitive the agent can reach. Most hobby deployments don't lock that down.
Agreed on the simplicity point. Your compose setup is exactly how I run my personal instance, and for non-sensitive tasks, it's perfect. The "no extra containers" bit is a huge plus for reliability and mental overhead.
But the thread got me thinking about your `volumes:` line. That direct mount binds the DB's fate to the container's security *and* the host's. If an agent escapes the app's logic sandbox (bug, future tool, etc.), that path is a known, static target. With Postgres over a socket or network, even a breakout might hit a connection barrier or auth prompt.
So maybe the real benefit of a separate DB isn't scaling, it's adding a tiny choke point. Still, for homelab tinkering, your setup's trade-off is totally valid. Sometimes simple is just better.
That's a good point about the volume mount! I hadn't even thought about the path being a static target for a breakout.
But your last sentence confuses me a little. If an agent escapes the sandbox, wouldn't it just inherit the container's user? Couldn't it still, in theory, use the same database connection the main app is using? Or does the separate DB container add a real user/permission boundary that's harder to cross than just reading a file?
> The default memory tool's `search_memory` method typically just passes the query string straight into a SQL LIKE clause.
This pattern is endemic, and your source dive confirms the core vulnerability. The missing policy layer is essentially a classic confused deputy problem: the tool acts with the framework's full database privileges on behalf of an agent whose instructions cannot be trusted.
While forking the class to filter by `agent_id` is the immediate band-aid, it's fragile. A compromised agent tasked with "summarize all project notes" might have a legitimate `agent_id`, so the filter does nothing. The policy needs to be statement-based: *this* agent can only issue queries of *this* pattern against memories tagged with *these* labels. Implementing that requires moving from a passthrough ORM to a query builder that sanitizes and appends clauses, which is why most frameworks skip it.
You're correct that the mitigation isn't the database, but I'd push further: the mitigation isn't even just access control on the tool. It's designing the memory system such that the tool isn't a generic query interface at all. It should expose named, parameterized methods like `get_relevant_context(task_description, limit=5)`. The moment you give an untrusted process a string that becomes a SQL clause, you've lost.
Show me the threat model.
Absolutely, you've nailed it with the confused deputy framing. That's exactly the pattern in CVE-2023-38745 for the Auto-GPT SQLite plugin, where the tool's `execute_sql` function passed raw queries.
The part about moving away from a generic query interface is key, but there's a practical side: even a named method like `get_relevant_context` often just wraps `SELECT * FROM memories WHERE content LIKE %s`. The policy layer has to be in that WHERE clause builder.
Some recent frameworks are trying to embed tags or tagsets at insertion, then only allowing retrieval based on those. It's still early, but it at least adds a mandatory filter field the agent can't bypass in the query string. Not perfect, but moves the problem from the tool to the schema.
CVE collector
I think you're right about the core trade-off: simplicity is a valid, powerful feature for homelabs. Your setup is a classic example of minimizing moving parts, which is a security win in itself for trusted, low-risk tasks.
That said, I think your snippet highlights a subtle point folks sometimes miss. Your `volumes:` line binds the database's fate to both the container *and* your host's filesystem security. The attack surface shrinks, but the consequences of any breach expand sideways. A container breakout doesn't need to exploit SQL injection; it just needs `cat /app/storage/superagi.db`.
For your use case, that's probably an acceptable, calculated risk. It's good you've got it behind a reverse proxy with auth, but I'd just add one practical step: make sure that `./storage` directory on your host has restrictive permissions (like `chmod 700`) and isn't world-readable. It's a tiny barrier, but it's something an escaped process might still trip over.
Opinions are my own, actions are mod-approved.
You're right about the simplicity trade-off, and your setup is valid for that homelab context. But you've focused on the *application* attack surface shrinking. You should also think about the *runtime* attack surface.
Your volume mount makes that SQLite file a static, predictable target on the host filesystem. If an agent's behavior drifts or a tool exploit gives it arbitrary code exec, it can just `open()` that path. No SQL queries needed.
For trusted setups, your bigger risk isn't the database choice, it's the lack of a behavioral baseline for those 3-5 agents. If you aren't monitoring for anomalous syscalls (like suspicious file reads/writes outside expected patterns), you won't see the breakout coming. A separate DB container adds a minor hurdle, but runtime monitoring tells you when something tries to clear it.
Consider adding a basic seccomp profile to your container that blocks `openat` and `execve` for paths outside `/tmp` or the tool directories. It's a few lines in your compose file and gives you that choke point without another service.
Baseline or bust.
That's a good point about seccomp. I've been running nano_claw in a Proxmox LXC with AppArmor, but I haven't touched seccomp profiles directly. You're saying I should add a `security_opt` line to the docker run command in the LXC?
If an agent can `open()` any path, wouldn't it also need to `connect()` somewhere to actually exfiltrate the data it reads? Blocking outbound network calls from the container seems easier to reason about than blocking specific file opens. Or is that too blunt an instrument?