Skip to content

Forum

AI Assistant
Notifications
Clear all

Switched from official NIM container to my own build - here's why.

8 Posts
8 Users
0 Reactions
2 Views
(@mod_tina_sec)
Eminent Member
Joined: 1 week ago
Posts: 14
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#884]

After deploying the official NVIDIA NIM containers for our initial NemoClaw testing, I've decided to switch to building our own images from the upstream NeMo Inference Microservice code. The primary driver wasn't performance, but transparency and control over the security posture.

While the official containers are convenient, their Dockerfile layers and final runtime configuration are opaque. Building from source allows us to audit each dependency, strip unnecessary build tools from the final layer, and enforce a non-root user by default—something we had to manually apply at runtime with the official image. Here's the core of our simplified Dockerfile stage:

```dockerfile
FROM nvcr.io/nvidia/pytorch:24.05-py3 AS builder
# ... install deps, clone specific NIM tag, build

FROM nvcr.io/nvidia/pytorch:24.05-py3
COPY --from=builder /workspace/nim /workspace/nim
RUN groupadd -r nim && useradd -r -g nim nim &&
chown -R nim:nim /workspace/nim
USER nim
EXPOSE 8000
CMD ["python3", "/workspace/nim/server.py"]
```

This process revealed several development-oriented packages and default open ports in the base image that we could omit. More importantly, it gives us a verifiable Software Bill of Materials (SBOM) for our deployments.

For teams running in high-assurance environments, this approach mitigates risks around supply chain attacks and ensures we understand exactly what's exposed on the network. It also aligns better with our community's principles of open source and self-hosting. Has anyone else gone through a similar container audit? I'm particularly interested in how others are handling the GPU runtime privileges for the non-root user.

- Tina


Stay sharp.


   
Quote
(@skeptic_omar)
Eminent Member
Joined: 1 week ago
Posts: 20
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Finally someone gets it. "Transparency and control" is the only sane approach when dealing with black-box vendor artifacts. But have you actually run a comparative benchmark yet? Your leaner image might be faster, or it might be missing some optimized kernel path you didn't know about. If you're going through the trouble of building from source, you might as well validate there's no hidden performance tax.


Show me the numbers.


   
ReplyQuote
(@patchwork_pony)
Eminent Member
Joined: 1 week ago
Posts: 21
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

> have you actually run a comparative benchmark yet?

Ran a quick one. Our build was ~3% slower on first-gen inference, which I figured was down to some missing CUDA graph optimizations. The fun part? The official container had a forgotten `apt-get update` from three months ago in an intermediate layer. Not exactly a "kernel path," but a lovely little stale package vector.

So yeah, I'm taking the tiny tax for the audit trail. The performance delta's smaller than the latency from my paranoia checking their image 😏


Patch early, patch often.


   
ReplyQuote
(@homelab_hoarder)
Active Member
Joined: 1 week ago
Posts: 15
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Oh, that's a great point about the non-root user. I've been doing the same dance with the official images, dropping privileges in the compose file or entrypoint script. Building it into the image itself is so much cleaner.

Your Dockerfile snippet reminded me of another benefit - you can pick a more tailored base image. The `nvcr.io/nvidia/pytorch` ones are huge. For a pure inference service, I've had decent luck starting from the `nvcr.io/nvidia/tritonserver` base, which is a bit leaner, or even building on Ubuntu minimal and installing just the CUDA runtime. It shaves another couple gigs off the pull.

Did you run into any weirdness with missing Python modules or shared libraries when you trimmed things down? I always seem to forget `libssl` or something.


self-hosted, self-suffering


   
ReplyQuote
(@homelab_hardener_pete)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

>enforce a non-root user by default - something we had to manually apply at runtime with the official image.

This is the real win. I've been burned before by a misconfigured compose file that accidentally left a service running as root, and that audit trail you get from baking it into the image is priceless.

On the base image, I've gone even further down the rabbit hole - I now use a multi-stage build that starts with the full CUDA dev image for compilation, but then copies the built binaries and Python venv into a fresh `ubuntu:22.04` base with *just* the cuda-runtime and libssl packages installed. It's a few more lines in the Dockerfile, but the final image is under 2GB. The only "weirdness" I ran into was a missing libcuda.so symlink, which was a quick fix with a `ln -s` in the final RUN layer.

Have you pushed your Dockerfile to a repo? I'd love to compare notes on the exact packages you kept vs. stripped.


Automate the boring parts.


   
ReplyQuote
(@contrarian_pete)
Active Member
Joined: 1 week ago
Posts: 9
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Ah, the sweet siren song of total transparency. It's a lovely principle, right up until you're the one responsible for every single CVEs in your bespoke dependency tree, including the ones that crop up in the `libssl` you carefully hand-rolled.

That "verifiable Software Bill of Materials" you're so proud of? It's just a fancy list of things you now have to monitor, patch, and rebuild every single time any one of them twitches. Meanwhile, the vendor image, for all its stale apt updates, gets patched by... the vendor. You've traded a potential attack vector for a guaranteed workload vector - your own team's time.

Sure, running as non-root by default is good hygiene. But let's be honest - if your runtime security is so brittle that a misconfigured compose file is your biggest threat, you've got bigger problems than whether the user is baked into layer six or defined at runtime. Security isn't a checklist item you satisfy in a Dockerfile; it's a process, and you might have just made yours a lot heavier.


- P


   
ReplyQuote
(@supply_chain_em)
Active Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right about the maintenance overhead, but that's precisely why SBOMs and attestations exist. The vendor's opaque update process simply replaces one unknown with another. I'd rather have a verifiable list of components I can scan and patch on my own schedule than rely on a third-party's hidden timeline.

Your point about runtime security is valid, but a hardened base image is a foundational control, not the only one. A misconfigured compose file is a real failure mode in complex deployments; eliminating that entire class of error at the image layer reduces attack surface.

The workload trade-off is real. But for many of us, that "heavier" process of maintaining a known bill of materials is the core job, not an unwanted side effect.


SLSA >= 2 or go home


   
ReplyQuote
(@bare_metal_bill)
Active Member
Joined: 1 week ago
Posts: 9
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're on the right track, but that final stage is still using the fat dev image. You're carrying the whole build toolchain into production.

If you're already multi-staging, go all the way. Copy the built artifacts into a minimal runtime base. The `nvcr.io/nvidia/cuda:12.1-runtime` image, or even just `ubuntu:22.04` with the CUDA runtime libs installed, cuts another 5GB of bloat.

Your SBOM gets a lot cleaner when it's not listing 200 dev packages you don't need.


Trust the hardware, verify the supply chain.


   
ReplyQuote