Skip to content

Forum

AI Assistant
Switched from OpenA...
 
Notifications
Clear all

Switched from OpenAI to local models. The security audit scope shrank, but new risks popped up.

2 Posts
2 Users
0 Reactions
3 Views
(@ghost_wrangler)
Eminent Member
Joined: 1 week ago
Posts: 21
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#980]

Our recent shift from OpenAI's API to a self-hosted Llama 3.1 model was driven by a clear threat model: eliminating third-party data exfiltration and model poisoning risks. The initial security assessment was straightforward—audit scope shrunk to our own infrastructure, code, and the single model file. However, this simplification obscured a more complex, opaque supply chain.

The primary risk migrated from the API endpoint to the artifact pipeline. Instead of auditing OpenAI's SOC2 reports, we now must validate:
* The provenance of the model weights (checksums from Meta vs. a random Hugging Face repo)
* The integrity of the quantization process (we used `llama.cpp`'s quantize tool, but who built the binary?)
* The toolchain and dependencies used to compile our inference server

A concrete example: our initial `Dockerfile` pulled a pre-quantized model and a pre-built `llama-cpp-python` wheel. The SBOM was essentially useless.

```dockerfile
FROM python:3.11-slim
RUN pip install llama-cpp-python --extra-index-url https://abetterllama.com # Red flag
COPY ./models/mygpt-4bit.gguf /app/model.gguf # From where?
```

We hardened this by switching to a multi-stage build that compiles from known sources.

```dockerfile
# Stage 1: Build llama.cpp from a pinned git commit
FROM alpine:3.18 AS builder
RUN apk add --no-cache build-base cmake git
RUN git clone https://github.com/ggerganov/llama.cpp.git &&
cd llama.cpp &&
git checkout a1b2c3d4 &&
cmake -B build -DCMAKE_BUILD_TYPE=Release &&
cmake --build build --config Release --target quantize

# Stage 2: Create final image with verified artifacts
COPY --from=builder /llama.cpp/build/bin/quantize /usr/local/bin/
COPY ./models/original-consolidated.ckpt /tmp/ # Downloaded via signed manifest
RUN /usr/local/bin/quantize /tmp/original-consolidated.ckpt /app/model.gguf Q4_K_M
```

New risks that emerged:
* **Storage & Static Analysis:** A 4GB model binary is now a core asset. Static analysis tools fail on it, and we must rely on checksums alone. We implemented attestation checks against a small, known-good output from a fixed prompt.
* **Operational Security:** The model is now an attractive target for internal tampering. We had to implement filesystem integrity monitoring and runtime attestation for the loaded model's memory footprint.
* **Supply Chain Breadth:** While the third-party vendor count decreased, our dependency depth increased. We now have direct dependencies on Meta's model release process, `llama.cpp`'s security, and the underlying BLAS library's integrity.

The lesson was that localizing an AI component doesn't eliminate supply chain risk; it transforms it. The attack surface becomes less about continuous data leakage and more about a single, critical artifact's provenance and the integrity of its entire toolchain.



   
Quote
(@vendor_skeptic)
Eminent Member
Joined: 1 week ago
Posts: 16
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

That "hardened" multi-stage build just shifts the trust to your compiler toolchain. GCC, glibc, Python itself. You're now auditing an entire software supply chain you probably don't have the resources to validate.

Your original risk was a single vendor's security posture. Now it's a hydra.


show me the proof, not the whitepaper


   
ReplyQuote