The recent Kubernetes Pod Security Standards (PSS) deprecation of Pod Security Policies presents a critical juncture for deployments of NVIDIA's NeMo Inference Microservice (NIM). While the shift to a built-in admission controller is architecturally sound, the default PSS profiles—`privileged`, `baseline`, and `restricted`—are insufficiently granular for the nuanced threat model of a high-value AI inference endpoint. Running a NIM container under the `restricted` profile is likely to break functionality, while `baseline` or `privileged` introduce unacceptable risk surfaces without compensatory controls.
My primary concern is the inherent tension between NIM's operational requirements and the principle of least privilege. A NIM container, by its nature, demands capabilities and access that are immediate red flags under a strict audit lens. Consider the following runtime necessities and their security implications:
* **GPU Access:** Requires `device` mounts (`/dev/nvidia*`) and often the `nvidia-container-runtime`. This is a form of hardware escape vector if the underlying container runtime is compromised.
* **Model Repository Mounts:** Persistent volume claims for model storage are typical. This necessitates `ReadOnlyMany` or `ReadWriteMany` access patterns, increasing the attack surface for data exfiltration or tampering if the container is breached.
* **High Network Privileges:** NIM endpoints bind to specific ports (often 8000 for Triton) and may require elevated network policies for inter-service communication, potentially exposing a large attack surface.
* **Potential for Privileged Operations:** Some configurations, particularly for custom pre/post-processing, may historically have required `SYS_ADMIN` or other kernel capabilities for optimized data loading or shared memory management.
A naive deployment using `privileged` or even `baseline` PSS would allow this without logging a single violation. The audit trail would be clean, but the runtime would be dangerously permissive. Therefore, we must construct a layered defense that satisfies both PSS and a stricter, custom `PodSecurityPolicy`-like policy. This involves defining a custom `Namespace` label for a tailored PSS, and then augmenting it with explicit `securityContext` and `seccomp`/`AppArmor` profiles.
For example, a minimally permissive, non-privileged pod spec might look like this, but note it may still require adjustments for your specific NIM version and model:
```yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: nemo-inference-nim
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: nim
image: nvcr.io/.../nim:tag
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
add:
- CHOWN # Example: Only if absolutely required for model directory permissions
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /models
name: model-store
readOnly: true
```
The critical exercise is to:
1. Deploy with `baseline` PSS enforced at the namespace level.
2. Capture a comprehensive `audit.log` stream from the Kubernetes API server during NIM initialization and inference.
3. Analyze for any denied requests, which indicate necessary privileges.
4. Iteratively define a custom `PodSecurityAdmissionConfiguration` exemption or a tailored `securityContext` that grants only those specific, audited privileges.
Without this forensic approach to policy creation, we are merely trusting the runtime's default posture, which for a component as complex and attractive as a NIM, constitutes a significant supply chain and runtime risk. I am particularly interested in community data points on which specific capabilities (`NET_BIND_SERVICE`, `DAC_OVERRIDE`, etc.) have proven essential for stable NIM operation under `containerd` with `runc`. Have others begun mapping these requirements against the MITRE ATT&CK Container matrix?
E
You've pinpointed the core issue exactly. The built-in profiles are a coarse-grained control surface, and their inadequacy forces a false choice between functionality and security. This is where the compensatory controls you mentioned become non-negotiable.
A PSS `baseline` or even `privileged` pod can be defensible, but only if it's immediately wrapped in a dedicated, application-tailored seccomp profile and a restrictive set of Linux capabilities. For a NIM container, you'd need to audit the exact syscalls it requires beyond the typical dangerous ones like `ptrace` or `keyctl`, and you'd almost certainly need to grant `CAP_SYS_ADMIN` for the NVIDIA runtime while dropping everything else. This moves the security boundary from the pod spec to the kernel's syscall filter, which is a far more precise instrument.
The real gap is that PSS doesn't mandate or easily integrate this next-layer filtering. So you're left with a `privileged` pod in the audit log, terrifying a CISO, even if its actual kernel attack surface is smaller than a `restricted` pod running a shell. The community needs patterns for coupling PSS admission with automated injection of runtime security profiles.
Least privilege is not optional.