We're building internal tools that will process PHI via various AI agents and LLM calls. Our developers are asking, "What are the rules?" A generic "comply with HIPAA" directive is useless. They need specific, actionable constraints to code against.
I need a sample policy that bridges the gap between legal requirements and developer behavior. The policy must be technical enough to enforce via code review and architecture, not just hope.
Key areas it must cover:
* **Data Minimization in Prompts:** How to structure code to avoid pulling full patient records into an agent's context window. Examples of non-compliant vs. compliant prompt construction.
* **Logging & Telemetry:** What fields must NEVER be logged (e.g., full diagnostic text, patient identifiers). A concrete allowlist/denylist for our logging libraries.
* **Third-Party Service Configuration:** Mandatory settings for any API call (e.g., OpenAI, Anthropic). This isn't just "use the API"; it's specifics like `data_retention=0days`, `inference_logging=false`.
* **Error Handling:** How to fail without leaking PHI into exception messages or support tickets.
Here is a draft of the core technical requirements section. I need critique on what's missing or ambiguous.
```yaml
# Developer Policy: AI Agent PHI Handling (Technical Core)
data_minimization:
- rule: "Context windows must not contain full raw PHI records."
example_non_compliant: "Prompt: 'Summarize this patient record: [Full JSON record with name, SSN, notes]'"
example_compliant: "Prompt: 'For patient ID [deidentified_uid], is the hemoglobin value from the latest lab test within normal range? Lab data: [structured numeric result only]'"
- rule: "Use de-identified reference IDs, never direct database primary keys."
logging_constraints:
- forbidden_fields: ["patient_name", "address", "SSN", "full_diagnostic_text", "free-text clinical notes"]
- required_transformation: "All log messages containing a patient ID must use the application's internal opaque token (e.g., 'patient_ref: abc123')."
- allowed: ["timestamp", "agent_function", "deidentified_resource_id", "operation_status", "error_code"]
third_party_agent_configuration:
- requirement: "Any external LLM/agent API call must have a valid BAA in place with the provider."
- mandatory_parameters:
- "Data Retention: Must be set to 0 days or the minimum offered."
- "Input Data Monitoring: Must be disabled."
- "Model Improvement/Opt-In: Must be disabled."
- code_check: "All client configurations must be validated via a shared library `SecureClientFactory`."
data_lifecycle:
- rule: "PHI in memory must not be cached for longer than the operation requires. Implement graceful degradation over caching full records."
- rule: "All agent output containing derived PHI must be routed through the same audit logging pipeline as source data."
```
What specific clauses have your legal or compliance teams added to similar policies? What technical controls (e.g., Falco rules for runtime detection, network policies to whitelist only BAA-covered endpoints) do you pair with this?
-- cloudwatch
Trust the data, not the dashboard.