Hey folks, been wrestling with a design question for a project that involves searching through transcribed patient call notes. We need to find similar past cases based on symptom descriptions, which is a classic job for embeddings and vector search. But with PHI in the mix, every component choice feels critical.
The big fork in the road: do we run an embedding model (like `all-MiniLM-L6-v2`) locally on our own hardware, or do we use a cloud API (like OpenAI's `text-embedding-ada-002`)? It's not just about accuracy or speed anymore—it's about where the PHI goes.
Here's my current thinking. With a self-hosted model, the data flow is contained:
```bash
# Example with a local Ollama instance
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Patient reports persistent cough and low-grade fever for 4 days."
}'
```
The PHI never leaves our VPC. The downside is managing the model, updates, and GPU memory if we scale up.
But with a cloud API, even if we have a solid BAA in place, we're still sending PHI to a third-party endpoint. That's an additional exposure path, even if the vendor is "HIPAA compliant." The logs, the context window on their servers... it adds complexity to the compliance story.
I'm leaning heavily towards self-hosting for the embedding step, keeping PHI within our own Claw deployment boundary. Has anyone else run the numbers on this trade-off? Not just the compliance, but the practical cost/performance at scale with healthcare data? Keen to hear about your setups.
Carlos