HIPAA Compliant AI Development: A Practical Guide
HIPAA does not prohibit using AI coding tools. It prohibits sending protected health information to unauthorized parties. If your codebase references patient data structures, claim identifiers, or PHI schemas, standard AI tools create real compliance exposure. Here is how to close the gap.
What HIPAA Actually Says About AI Tools
HIPAA does not prohibit using Claude, Copilot, or any other AI tool. The law predates these tools by decades. What HIPAA requires is that protected health information (PHI) is only disclosed to parties that have a Business Associate Agreement (BAA) in place with the covered entity or business associate.
The question for healthcare engineering teams is precise: does your codebase, or code derived from your codebase, contain PHI or patterns that could be used to identify PHI?
If the answer is yes, you need either a BAA with every AI provider your developers use, or a technical control that prevents PHI from reaching those providers.
What Constitutes PHI in Source Code
The obvious cases: hard-coded patient names, social security numbers, or medical record numbers. These appear in test fixtures, development databases, and legacy code more often than most teams realize.
The less obvious cases:
- Data model schemas with identifiers like PatientId, MemberBenefitCode, ClaimAdjudicationResult - Variable names derived from HIPAA identifiers (the 18 identifiers defined in the Privacy Rule) - Comments referencing specific patients, even in test files - Configuration files referencing PHI-bearing endpoints or database tables
The regulatory question is not whether a specific patient can be identified from the source code in isolation. The question is whether PHI is present or represented, which includes data structures, identifiers, and schemas that handle PHI.
The BAA Problem for AI Tools
Large AI providers do offer BAAs. Anthropic, Microsoft (for Azure OpenAI), and Google (for Vertex AI) all have enterprise agreements that include HIPAA-aligned BAAs.
The limitations:
1. BAAs are available only on enterprise plans. Developers using standard Claude.ai, GitHub Copilot Individual, or similar consumer-tier plans are not covered by a BAA.
2. BAAs cover data handling, not data transmission scope. Even with a BAA in place, you are sending PHI-adjacent code to a third party. The BAA defines how they handle it; it does not prevent the transmission.
3. BAA management is complex. If your developers use multiple AI tools, you need a BAA with each provider, renewed annually, with appropriate controls documentation.
For teams that primarily work with one provider at enterprise tier, the BAA approach is workable. For teams with diverse AI tool usage, it becomes operationally complex.
The Alternative: Technical Prevention
The most robust HIPAA approach for source code is preventing PHI patterns from reaching AI providers at all. If the identifiers that reference PHI are mutated before transmission, the data that leaves your network does not contain PHI patterns.
This is the approach Pretense implements:
// Original code (contains HIPAA-relevant identifiers)
function getPatientRecord(patientId: string): Promise<PatientRecord> {
return hipaaAuditedDb.query(
'SELECT * FROM patient_records WHERE id = ?',
[patientId]
);// After Pretense mutation (sent to AI API) function _fn3a7c(_v8d2e: string): Promise<_cls4f1b> { return _cls9a3d.query( 'SELECT * FROM patient_records WHERE id = ?', [_v8d2e] ); } ```
The function still makes sense to the LLM. The parameter type, return type, and query structure are all visible. But the identifiers that reference the PHI data model are synthetic.
Note that the SQL string literal is preserved verbatim. This is a deliberate choice: mutating string literals would break output quality. However, the table name in a SQL query is metadata about your schema, not PHI itself.
PHI Detection in Pretense
Beyond mutation, Pretense includes a secrets engine that detects PHI patterns in prompts and blocks their transmission:
Detected patterns include:
- Social Security Numbers (XXX-XX-XXXX format) - Medical Record Numbers (common institutional formats) - National Provider Identifiers (NPI) - Health plan beneficiary numbers - Account numbers and certificate/license numbers - Date of birth patterns - Geographic identifiers below state level (ZIP codes in certain contexts) - Phone and fax numbers
If a developer accidentally pastes a code block containing a patient ID or hard-coded SSN into a prompt, Pretense blocks the request before transmission and logs a PHI detection event.
ERROR [pretense] PHI pattern detected in outbound request
Pattern: SSN_FORMAT
Location: prompt line 47Review the prompt and remove PHI before retrying. ```
Building the HIPAA Control Documentation
HIPAA requires covered entities and business associates to implement technical safeguards that control access to electronic PHI. For AI tool usage, the control documentation should include:
Required Documentation
1. Risk Assessment: Document the risk of AI tool usage for code containing PHI patterns. This is required by the Security Rule and must be updated when significant technology changes occur.
2. Technical Safeguard Documentation: Document the control implemented to prevent PHI transmission. A proxy audit log is strong technical evidence.
3. Workforce Training Records: Document that developers have been trained on the PHI transmission policy.
4. BAA Status: If using AI tools without a proxy, document BAA status for each provider.
Audit Log Requirements
HIPAA requires audit logs for systems that access, modify, or transmit electronic PHI. If AI tool API calls could potentially include PHI, those calls need to be in your audit log.
Pretense generates audit logs that meet this requirement:
{
"event_id": "evt_8x3k2p",
"timestamp": "2026-04-04T09:14:22Z",
"developer": "engineer@healthcareorg.com",
"repository": "claims-processing-service",
"phi_patterns_detected": 0,
"phi_patterns_blocked": 0,
"identifiers_mutated": 18,
"secrets_detected": 0,
"model": "claude-3-5-sonnet",
"audit_event_type": "ai_api_call",
"hipaa_relevant": true
}HIPAA-Specific Report Generation
pretense report --format hipaa --period 2026-01-01:2026-03-31The HIPAA report includes: PHI detection events with dates and developers, blocked transmission attempts, mutation coverage statistics, and a control effectiveness summary suitable for inclusion in your HIPAA risk management documentation.
Practical Implementation Steps
1. Classify your repositories: Identify which codebases contain PHI-adjacent code (models, schemas, logic that handles patient data).
2. Deploy Pretense on PHI-adjacent repos first: You do not need to protect every repository. Start with the ones where HIPAA exposure is real.
3. Run the PHI detection scan: `pretense scan --phi-mode --path src/` will identify files containing hard-coded PHI patterns that need to be cleaned up independently of mutation.
4. Generate the initial audit baseline: Run Pretense for 30 days and generate the first audit report. This becomes your evidence of control effectiveness.
5. Update your risk assessment: Document the deployment of technical safeguards in your HIPAA risk assessment.
A Note on Test Data
Test files are a common source of PHI in codebases. Developers create test fixtures with realistic-looking data and sometimes use real patient identifiers from development databases. Pretense detects these patterns during the scan phase, but the underlying issue requires remediation: replace real PHI in test fixtures with synthetic data that follows the same structural patterns.
# Scan for PHI in test files specifically
pretense scan --path tests/ --phi-mode --format reportCleaning up PHI in test files is a one-time investment that eliminates a persistent compliance risk, regardless of AI tool usage.
[Schedule a demo to see the HIPAA compliance report](/demo) or [see how healthcare teams have deployed Pretense](/use-cases).
[Start your free trial](/trial)
Share this article