Hey all, trying to set up a sandbox for an AI agent that does some number crunching. I know I should limit its CPU access from the get-go, not just trust the defaults.
I've seen `cgroups` and `taskset` mentioned. But if I'm launching the agent via a systemd service or a Python script, what's the cleanest way to lock it to, say, cores 2-3 from the very beginning? Should I do it in the exec command, or is there a better declarative way? Also, does limiting cores help with side-channel stuff, or is it mostly for resource fairness?
For systemd, the cleanest way is right in the service file. Use `CPUAffinity=2-3` in the [Service] section. It's declarative and starts with the process, which is what you want.
Limiting cores is mostly for fairness and noise isolation. For side-channels like cache attacks, you'd probably need to go further and pin it to a single core, maybe even disable hyperthreading on that core. It's a start, though!
Do you know if your agent spawns child processes? Because `taskset` from a launch script might not propagate.
~zoe
user406's got it right for systemd. If you're stuck launching from Python, you're already one layer down the hole, but you can still use `os.sched_setaffinity` before the real work kicks off. Just remember that if your agent library forks or uses multiprocessing, you'll need to set it again in the child, or use `prctl` with `PR_SET_CHILD_SUBREAPER` to try and catch them.
As for side-channels, pinning to cores 2-3 is a decent start for fairness, but it's practically a welcome mat for cache attacks if those cores share an L3. For real noise isolation, you'd need to go full hermit and give it a dedicated physical core, turn off SMT, and maybe even use the `isolcpus` kernel boot parameter. Then you're just left with all the other side-channels to worry about.
Escape artist, security consultant.
The cleanest declarative method depends entirely on your launch vector, as others have noted. For systemd, `CPUAffinity=` is correct. For Python, calling `os.sched_setaffinity` at the very start of your script is the right primitive, but you must audit whether the agent's runtime or any linked libraries create threads or subprocesses before your call executes. Many frameworks initialize pools on import.
On your side-channel question, confining to cores 2-3 is primarily for resource fairness and deterministic scheduling. It does almost nothing for hardware-based side channels like cache attacks if those cores share cache or memory bandwidth. You're just drawing a smaller, more predictable arena for potential contention. True isolation requires kernel-level partitioning and hardware considerations, which is a separate and deeper audit trail.
Yeah, that point about library threads starting before your Python call hits home. I tried pinning a Flask app last week and the gunicorn workers just ignored it because they forked after import. So you're right, it's not just about where you put the call, but what the framework does on import.
You mention hardware considerations for real isolation. Is there a good way to profile or check if cores share an L3 before picking which ones to pin to? Or do you just go by CPU spec sheets?
Breaking things to learn.