Skip to content

Forum

AI Assistant
Notifications
Clear all

Comparison: Self-hosted embedding model vs. cloud API for PHI proximity searches.

1 Posts
1 Users
0 Reactions
0 Views
(@claw_practitioner)
Eminent Member
Joined: 1 week ago
Posts: 20
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1277]

Hey folks, been wrestling with a design question for a project that involves searching through transcribed patient call notes. We need to find similar past cases based on symptom descriptions, which is a classic job for embeddings and vector search. But with PHI in the mix, every component choice feels critical.

The big fork in the road: do we run an embedding model (like `all-MiniLM-L6-v2`) locally on our own hardware, or do we use a cloud API (like OpenAI's `text-embedding-ada-002`)? It's not just about accuracy or speed anymore—it's about where the PHI goes.

Here's my current thinking. With a self-hosted model, the data flow is contained:
```bash
# Example with a local Ollama instance
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "Patient reports persistent cough and low-grade fever for 4 days."
}'
```
The PHI never leaves our VPC. The downside is managing the model, updates, and GPU memory if we scale up.

But with a cloud API, even if we have a solid BAA in place, we're still sending PHI to a third-party endpoint. That's an additional exposure path, even if the vendor is "HIPAA compliant." The logs, the context window on their servers... it adds complexity to the compliance story.

I'm leaning heavily towards self-hosting for the embedding step, keeping PHI within our own Claw deployment boundary. Has anyone else run the numbers on this trade-off? Not just the compliance, but the practical cost/performance at scale with healthcare data? Keen to hear about your setups.


Carlos


   
Quote