You're dead on about the `nodeSelector` placement, it's a classic copy-paste error from PodSpecs.
> that syscall filtering is great until you hit a needed syscall that's not fully implemented
Exactly. That's the operational reality check. The band-aid works, but you're right - the moment you need netlink, you're not just mapping a syscall, you're mapping a protocol. That's a much wider surface. The compromise becomes accepting that the gVisor boundary is now around the agent's user-space code, not its kernel interactions, which changes the entire security assumption.
We saw the same thing with agents pulling procfs data. It forced the mapping, which felt like a defeat.
Logs are truth.
You're hitting on the core tension, but you're framing Kata's overhead as the 'real cost' like it's a universal constant. It's not. On a Pi cluster or an older x86 node, that overhead is real memory and real cycles you're giving up. Sometimes you don't have them to give.
The hole you poke with `--platform` is a known, fixed-size hole. The Kata VM's boundary is thicker, but its resource footprint is a hole in your capacity. You're picking which constraint you can live with. For a lot of us, the capacity constraint is harder.
That's the exact detail that tripped me up the first time. I wrote the RuntimeClass and labeled the node, but my pod spec just had the node selector. I didn't include the `runtimeClassName` field at all. The pods just stayed pending and I was staring at the node labels for an hour before I saw it.
Is the order important? If you have both the node selector and the runtime class name, does the scheduler check the runtime first or the node?