Hi everyone. I'm trying to lock down the dependencies for a basic agent project. When I pin `numpy` to a specific version (like `numpy==1.24.3`) in my `requirements.txt`, it breaks `pandas` on install. It throws a version conflict error.
I know `pandas` has a loose dependency on `numpy`. This seems like a classic supply chain issue, but it's super common in ML/agent stacks. How do you all handle strict pinning for core data science libs without breaking the ecosystem? Is a layered approach (core vs. app dependencies) the way to go here?
I'm worried about unpinned pulls in CI, especially with tools that might pull latest `numpy` otherwise.
anna
Yeah, you've hit the standard Python packaging headache. Pinning a hard version of a core lib like numpy upstream of pandas is asking for trouble. Their dependency spec is loose for a reason.
For a real deployable system, you don't pin in `requirements.txt`. You let the resolver do its job, then freeze the *output*. Use `pip-compile` from pip-tools or just `pip freeze` after a clean install into a venv. That gives you the resolved, compatible versions of *everything*, including pandas's accepted numpy version. Check that file in.
Your CI should install from that frozen `requirements_lock.txt`, not the loose specs. That's the only way to get consistency without manual conflict solving.
The layered approach is overkill for most agent stacks. Just freeze the whole environment.
-- mike
Freezing the whole environment can introduce hidden transitive dependencies that weren't in your original spec. If you don't audit that lockfile, you're pulling in a new supply chain attack surface.
Better to separate core library constraints from your app's direct dependencies. Use a constraints file for the foundational libs like numpy, then let the resolver work within those bounds. That's what the loose dependency spec is for. You lock the floor, not the ceiling.
I've seen systems break because a frozen pandas pulled in a new, vulnerable version of a transitive dependency six levels down. The resolver's output isn't a security policy.
Ah, the classic "I pin my own dependencies, but my dependencies' dependencies are a chaotic free-for-all" problem.
You're right to be worried about unpinned pulls, but you've discovered the hard truth: pandas and numpy don't live in a compliance framework. Their versioning is, frankly, security theater. That loose dependency spec is a polite fiction that means "we'll probably work with a range, but good luck figuring out the exact matrix."
The layered approach you mention is just another form of constraint solving, and it's fragile. Your CI should be installing from a *verified* artifact, not re-running a resolver that could pull a new, incompatible numpy build tomorrow. Pin everything from a known-good state, test it, and ship that. If pandas breaks, it's not your pin that's the problem, it's their lack of a real, testable contract.
Frankly, if you're in a regulated space, this whole "let the resolver decide" model is a compliance gap you could drive a truck through.
audit what matters
That "polite fiction" line is a good way to put it. So, the verified artifact you mention - is that basically a pre-built container image with all the deps baked in? That seems like the only way to actually guarantee the state.
But doesn't that just kick the can down the road? You still have to build the image from something, and that build will run a resolver unless you're manually copying wheels.
Yes, the build still runs a resolver. The "verified artifact" is just a snapshot of that resolver's output at a single point in time. You've traded a runtime problem for a build-time problem. Now your security boundary is your CI pipeline, not your package manager.
That's why you pin the *pinned* dependencies too. Lock the versions of pip, setuptools, wheel. Use `--no-deps` and install from a local, versioned wheelhouse you've already audited. Containers don't magically solve provenance, they just give you a slightly heavier artifact to checksum.
The can gets kicked straight into your build system's trust model.
Exactly, that's the real crux of it. Your CI pipeline becomes a source of truth, and you have to lock it down just as hard. I've started versioning my entire build environment in Docker - not just the Python deps, but the OS packages, pip version, everything - and then treating that image hash as the real dependency.
It's a heavier lift, but it means my agent's build is reproducible from a single, immutable layer. The trade-off is you're now maintaining a base image registry, but it beats surprise breaks at 2am because a transitive dep got yanked.
Selfhosted since 2004
That's a critical shift in the security model you've identified. The trust anchor moves from the package index to the CI/CD platform's configuration and its ability to execute arbitrary code. You now have to secure the pipeline's secrets, its runners, and the integrity of every step that fetches `pip` itself.
This is where immutable, version-tagged base images become essential, but they're only the first layer. You need kernel-level isolation for the build container to prevent a compromised `pip` or `setuptools` from breaking out and persisting on the runner. I'd enforce a seccomp-bpf filter and a user namespace for the build step, dropping all capabilities and restricting syscalls like `mount` and `unshare`. The can gets kicked into the container runtime's security configuration, which is often overlooked.
A verified artifact is only as good as the isolation of the process that created it.
Audit everything, trust no syscall.