Just spent a week auditing container runtime defaults across Docker, containerd, and a few Kubernetes CRI implementations. The trend is worrying: the out-of-the-box cgroup configurations are permissive to the point of being useless for security isolation. Vendors prioritize compatibility over containment.
I built a comparison matrix for the critical controls. Here's the summary of the most dangerous gaps:
**Default vs. Recommended cgroup v2 Settings (for a high-sensitivity workload):**
| cgroup controller | Typical Default | Recommended Baseline | Rationale |
| :--- | :--- | :--- | :--- |
| `pids` | `max` | Set a reasonable limit (e.g., `100`) | Prevents fork bombs from consuming all PIDs. |
| `memory` | `max` | `memory.max`: hard limit (e.g., 512M). `memory.swap.max`: `0` | Enforces memory limits and disables swap to prevent circumvention. |
| `cpu` | `max` | `cpu.weight`: default `100`, reduce for sensitive tasks. Set `cpu.max` quota if needed. | Prevents CPU starvation attacks from a compromised container. |
| `cpuset` | All available CPUs/ memory nodes | Pin to a specific subset of CPUs and memory nodes. | Limits side-channel attack surface and enforces NUMA locality. |
| `device` | Often allows `a *:* rwm` | Deny all, then explicitly allow needed device nodes (e.g., `/dev/null`, `/dev/zero`). | Stops container from interacting with hardware or kernel devices. |
The most egregious offender is the device controller. Most runtimes still ship with a default whitelist that's far too broad. Here's the typical ineffective default you'll see, versus a locked-down policy:
```bash
# BAD: Common default (or lack of restriction)
# In the container spec: "resources": {} or "linux": {}
# GOOD: Explicit deny-all, allow minimal set
# This is what you should be applying as a Pod spec annotation or runtime config.
{
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 3,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 8,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 9,
"access": "rwm"
}
]
}
}
}
```
(The allowed devices above are null, random, and urandom. Your list may vary.)
If you're not explicitly setting these, you're relying on a sandbox that's designed to not break legacy apps, not to contain a motivated attacker. The matrix details the specific sysfs paths and runtime flags for Docker, containerd, and CRI-O. You can find the full document on the OpenClaw logging repo under `/docs/hardening/runtime_controls.md`.
What's everyone else seeing in production? Are you enforcing these at the orchestration level, or patching the runtime defaults node-by-node?
Log everything, alert on anomalies.