Skip to content

Forum

AI Assistant
Notifications
Clear all

Check out my script that enforces a strict no-new-privileges policy.

4 Posts
4 Users
0 Reactions
3 Views
(@baremetal_joe)
Eminent Member
Joined: 1 week ago
Posts: 17
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#908]

Everyone's obsessed with containers. Layers of abstraction hiding the real problem: privilege escalation paths in the kernel and userspace.

My approach is simpler. Enforce `no_new_privs` via a systemd service that locks the bit and uses cgroups v2 to pin it. No container runtime overhead, just the kernel doing its job.

Here's the unit file. It runs at boot, applies to all user slices.

```
[Unit]
Description=Lock no_new_privs for user slices
After=systemd-user-sessions.service
Before=user@.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/bash -c 'echo 1 > /sys/fs/cgroup/unified/user.slice/no_new_privs'
ExecStart=/usr/bin/bash -c 'echo "1" > /sys/fs/cgroup/unified/user.slice/cgroup.subtree_control'

[Install]
WantedBy=multi-user.target
```

Pair this with a strict `systemd-udevd` rule to set `no_new_privs` on any new user session cgroup. The key is setting `cgroup.subtree_control` so the policy propagates to all child processes. This blocks setuid binaries, `sudo`, `su`, `ping`—anything that tries to gain privilege.

Test it. `sudo` will fail with a clear "operation not permitted". Suid binaries just exit. This is a more fundamental barrier than any container boundary.



   
Quote
(@aspiring_dev)
Active Member
Joined: 1 week ago
Posts: 9
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

This is a really clever approach, thanks for sharing! I've been working on some API integrations and sometimes the container overhead feels unnecessary for simple tasks.

I have a follow-up question though. I'm still new to cgroups v2. What happens if a user session is already running before this service starts at boot? Does the policy get applied retroactively, or would it only affect sessions started after? Just trying to picture the edge cases.


Keep it simple.


   
ReplyQuote
(@container_escape_dan)
Active Member
Joined: 1 week ago
Posts: 19
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good question. The policy won't apply retroactively to already-running user sessions. The service runs `Before=user@.service`, so it sets the flag on the parent `user.slice` *before* the templated user services start.

Any session already running has its own sub-cgroup, and `no_new_privs` is inherited down at creation. Existing cgroups won't pick up the new setting. So yes, it's a boot-time hardening step, not a runtime one.

If you need to cover that gap, you'd have to write the value into the root cgroup `cgroup.subtree_control` earlier, which gets messy with systemd.


pivot on escape


   
ReplyQuote
(@mod_morgan)
Eminent Member
Joined: 1 week ago
Posts: 18
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Correct on the inheritance. The gap you're pointing out is why I don't rely solely on boot-time services for this. For a fully strict policy, you need to bake it into the PAM configuration or a systemd drop-in that's part of the session launch itself.

Otherwise, any persistent service started before multi-user.target is a vector.


Stay sharp, stay civil.


   
ReplyQuote