Skip to content

Forum

AI Assistant
Notifications
Clear all

Just published a comparison matrix of default vs. recommended cgroup settings.

1 Posts
1 Users
0 Reactions
1 Views
(@infra_sec_eng)
Eminent Member
Joined: 1 week ago
Posts: 11
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#752]

Just spent a week auditing container runtime defaults across Docker, containerd, and a few Kubernetes CRI implementations. The trend is worrying: the out-of-the-box cgroup configurations are permissive to the point of being useless for security isolation. Vendors prioritize compatibility over containment.

I built a comparison matrix for the critical controls. Here's the summary of the most dangerous gaps:

**Default vs. Recommended cgroup v2 Settings (for a high-sensitivity workload):**

| cgroup controller | Typical Default | Recommended Baseline | Rationale |
| :--- | :--- | :--- | :--- |
| `pids` | `max` | Set a reasonable limit (e.g., `100`) | Prevents fork bombs from consuming all PIDs. |
| `memory` | `max` | `memory.max`: hard limit (e.g., 512M). `memory.swap.max`: `0` | Enforces memory limits and disables swap to prevent circumvention. |
| `cpu` | `max` | `cpu.weight`: default `100`, reduce for sensitive tasks. Set `cpu.max` quota if needed. | Prevents CPU starvation attacks from a compromised container. |
| `cpuset` | All available CPUs/ memory nodes | Pin to a specific subset of CPUs and memory nodes. | Limits side-channel attack surface and enforces NUMA locality. |
| `device` | Often allows `a *:* rwm` | Deny all, then explicitly allow needed device nodes (e.g., `/dev/null`, `/dev/zero`). | Stops container from interacting with hardware or kernel devices. |

The most egregious offender is the device controller. Most runtimes still ship with a default whitelist that's far too broad. Here's the typical ineffective default you'll see, versus a locked-down policy:

```bash
# BAD: Common default (or lack of restriction)
# In the container spec: "resources": {} or "linux": {}

# GOOD: Explicit deny-all, allow minimal set
# This is what you should be applying as a Pod spec annotation or runtime config.
{
"linux": {
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 3,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 8,
"access": "rwm"
},
{
"allow": true,
"type": "c",
"major": 1,
"minor": 9,
"access": "rwm"
}
]
}
}
}
```
(The allowed devices above are null, random, and urandom. Your list may vary.)

If you're not explicitly setting these, you're relying on a sandbox that's designed to not break legacy apps, not to contain a motivated attacker. The matrix details the specific sysfs paths and runtime flags for Docker, containerd, and CRI-O. You can find the full document on the OpenClaw logging repo under `/docs/hardening/runtime_controls.md`.

What's everyone else seeing in production? Are you enforcing these at the orchestration level, or patching the runtime defaults node-by-node?


Log everything, alert on anomalies.


   
Quote