HIPAA-Compliant Voice AI: Provider Options and Architecture Patterns

Feb 25, 2026

Building a voice AI agent for healthcare is hard enough without compliance. Add HIPAA requirements and suddenly half your vendor options disappear, your architecture needs redesigning, and every conversation your agent has becomes a potential liability.

The challenge isn't that HIPAA compliance is impossible for voice AI -- several providers now offer it. The challenge is that the information is scattered, incomplete, and often misleading. A TTS provider might claim "HIPAA compliance" on their marketing page but bury the BAA requirement in enterprise pricing. An STT provider might offer HIPAA-eligible infrastructure but only in specific regions.

This guide consolidates what actually matters: which providers sign BAAs, what architecture patterns satisfy HIPAA requirements, and how to validate that your voice AI agent handles protected health information correctly in production.

What HIPAA Requires for Voice AI

HIPAA (the Health Insurance Portability and Accountability Act) protects individually identifiable health information -- known as Protected Health Information (PHI). For a voice AI agent operating in healthcare, PHI can appear in multiple places: the audio itself, the transcription, the LLM context, tool call payloads, and conversation logs.

The Core Requirements

Business Associate Agreement (BAA): Any vendor that processes, stores, or transmits PHI on your behalf must sign a BAA. This isn't optional and isn't satisfied by a vendor's general terms of service. You need a signed BAA with every component in your voice AI pipeline that touches patient data -- STT provider, TTS provider, LLM provider, telephony provider, and any intermediary platform.

Encryption in Transit: All PHI must be encrypted during transmission. For voice AI, this means:

TLS 1.2+ for all API calls (STT, TTS, LLM)
SRTP or TLS for voice streams
WSS (not WS) for WebSocket connections
No unencrypted HTTP endpoints anywhere in the pipeline

Encryption at Rest: Any stored PHI -- audio recordings, transcripts, conversation logs, training data -- must be encrypted at rest. AES-256 is the standard. This applies to your own infrastructure and to every vendor that persists data.

Access Controls: Role-based access to PHI with unique user identification, automatic session timeouts, and audit trails. Every access to patient data must be logged with who, when, and what.

Audit Logs: Comprehensive logging of all access to and modifications of PHI. Logs must be retained for six years. For voice AI, this includes: who accessed a conversation transcript, when audio recordings were played back, when transcripts were exported or shared, and when conversation data was used for evaluation or training.

Minimum Necessary Standard: Your voice AI agent should only access the minimum PHI required to perform its function. If the agent is scheduling appointments, it doesn't need the patient's full medical history in its context window. This has direct implications for LLM prompt design and tool call scoping.

Voice AI-Specific HIPAA Considerations

Voice AI introduces unique compliance challenges that text-based systems don't face:

Audio recordings are PHI. A patient describing symptoms over the phone is generating PHI in audio form. Every recording, every cached audio buffer, every temporary file that contains patient voice data is subject to HIPAA protections.

STT transcription creates a written PHI record. The moment speech becomes text, you have a document-form PHI record that must be encrypted, access-controlled, and audit-logged.

TTS may contain PHI. If your agent reads back appointment details, medication names, or test results, the generated audio contains PHI.

LLM context windows contain PHI. When conversation history is sent to an LLM for response generation, the prompt payload contains PHI. The LLM provider must have a BAA in place and must not use that data for training.

Tool calls may expose PHI to third-party systems. If your agent calls an EHR API, a scheduling system, or a billing platform, PHI is transmitted to those systems. Each integration point needs its own BAA and compliance review.

STT Providers with HIPAA Compliance

Speech-to-text is the first processing step in most voice AI pipelines, and it's where raw audio containing PHI first gets converted to text. Not all STT providers offer HIPAA-compliant infrastructure.

Provider	HIPAA Compliance	BAA Available	Notes
Deepgram	Yes	Yes (Enterprise)	BAA available on Enterprise plans. Offers on-premise deployment option for maximum data control. Nova-2 model supports real-time and batch transcription.
AssemblyAI	Yes	Yes (Enterprise)	BAA available. Offers EU data processing option. Universal-2 model supports real-time streaming. Does not retain audio after processing by default.
Azure Speech Services	Yes	Yes	Covered under Microsoft's BAA for Azure. Supports data residency in specific regions. Custom speech models available. SOC 2 Type II certified.
Google Cloud Speech-to-Text	Yes	Yes	Covered under Google Cloud's BAA. Data processed in specified regions. Supports on-premise via Vertex AI.
Amazon Transcribe	Yes	Yes	Covered under AWS BAA. Medical-specific model (Amazon Transcribe Medical) designed for healthcare terminology.
OpenAI Whisper (self-hosted)	Depends on hosting	N/A	Open-source model you host yourself. HIPAA compliance depends entirely on your infrastructure. No BAA needed since no data leaves your environment.
NVIDIA Parakeet (self-hosted)	Depends on hosting	N/A	Open-weight model. Same self-hosting considerations as Whisper. Competitive accuracy with lower resource requirements.

Important: Providers that do NOT currently offer HIPAA compliance or BAAs in their standard offerings include several popular options. Always verify directly with the provider and get the BAA signed before processing any PHI. A "HIPAA-compliant" claim on a website is not the same as a signed BAA.

STT Architecture Decisions for HIPAA

Cloud STT with BAA is the simplest path. You get a signed BAA, confirm the data processing region, verify that audio isn't retained after processing, and use the provider's standard API. The limitation is that audio leaves your infrastructure.

Self-hosted STT gives you maximum control. Run Whisper or Parakeet on your own HIPAA-compliant infrastructure (or on a BAA-covered cloud provider). No audio ever leaves your environment. The trade-off is operational complexity, latency management, and losing out on provider-managed model updates.

Hybrid approach: Use cloud STT for non-PHI conversations and self-hosted STT for conversations that will involve PHI. Route based on the conversation type or department.

TTS Providers with HIPAA Compliance

Text-to-speech is where your agent's responses become audio. If those responses contain patient-specific information, the TTS provider is processing PHI.

Provider	HIPAA Compliance	BAA Available	Notes
ElevenLabs	Claims HIPAA compliance	Yes (Enterprise)	BAA available on Enterprise plans. Verify directly -- the compliance offering has evolved over time.
Azure Speech Services (TTS)	Yes	Yes	Same BAA as Azure STT. Neural voices available. Custom voice creation supported.
Google Cloud Text-to-Speech	Yes	Yes	Covered under Google Cloud BAA. WaveNet and Neural2 voices.
Amazon Polly	Yes	Yes	Covered under AWS BAA. Neural and standard voices. SSML support.
PlayHT	Verify directly	Enterprise	Check current status. Compliance features may be available on Enterprise plans.
Cartesia	Not confirmed	Verify	Fast and cost-effective but verify HIPAA eligibility before using with PHI.

TTS Considerations for Healthcare Voice AI

Voice quality matters more in healthcare. Patients need to clearly understand medication names, dosage instructions, and appointment details. Test TTS output for pronunciation accuracy of medical terminology. Mispronouncing "metformin" or "atorvastatin" isn't just a quality issue -- it's a safety issue.

Speech rate should be configurable. Elderly patients (a significant portion of healthcare callers) need slower, clearer speech. Your TTS should support rate adjustment without degrading quality.

No-log modes: Some TTS providers offer modes where the text input is not logged or retained. For PHI-containing utterances, this reduces the compliance surface area. Verify that no-log mode is actually enforced and that temporary processing buffers are purged.

LLM Providers with HIPAA Compliance

The LLM is the brain of your voice AI agent, and it receives the full conversation context including any PHI the patient has shared.

Provider	HIPAA Compliance	BAA Available	Data Training Opt-Out	Notes
Azure OpenAI	Yes	Yes	Yes (default)	GPT-4o, GPT-4, GPT-3.5 via Azure. Data is not used for model training by default. Regional data residency. SOC 2 certified.
Anthropic (via AWS Bedrock)	Yes	Yes (via AWS BAA)	Yes	Claude models via Bedrock inherit AWS's HIPAA compliance. Direct Anthropic API requires separate verification.
Google Cloud Vertex AI	Yes	Yes	Yes	Gemini models via Vertex AI. Covered under Google Cloud BAA.
AWS Bedrock	Yes	Yes	Yes	Multiple model providers (Anthropic Claude, Meta Llama, Cohere, AI21) available under AWS BAA umbrella.
Self-hosted open-source models	Depends on hosting	N/A	N/A	Llama, Mistral, etc. on your own infrastructure. Full control but operational complexity.

Critical: Using OpenAI's standard API (api.openai.com) is not the same as using Azure OpenAI for HIPAA purposes. The standard OpenAI API does not offer BAAs. If you need HIPAA compliance with OpenAI models, you must use Azure OpenAI Service.

LLM Architecture Patterns for HIPAA

PHI-aware prompt engineering: Design your system prompts so the LLM doesn't unnecessarily request or repeat PHI. If the agent has already verified the patient's identity, don't include the full SSN in every subsequent turn's context.

Context window management: Limit the PHI in the LLM's context to what's needed for the current turn. Implement sliding window or summarization strategies that strip PHI from older conversation turns before they're included in the prompt.

Tool call sandboxing: When the LLM makes tool calls to external systems (EHR lookup, scheduling, billing), ensure each integration has its own BAA and that PHI in tool call arguments is encrypted in transit.

Architecture Patterns for HIPAA-Compliant Voice AI

Pattern 1: Fully Managed Cloud (Simplest)

Use a single cloud provider (Azure, AWS, or GCP) for the entire pipeline:

User Call -> SIP/PSTN -> Azure Speech STT -> Azure OpenAI -> Azure Speech TTS -> Audio Out

Advantages: Single BAA covers the whole pipeline. Consistent data residency. Unified audit logging via the cloud provider's compliance tools.

Disadvantages: Vendor lock-in. May not get best-in-class quality for every component. Higher per-minute cost than mixing providers.

Pattern 2: Multi-Provider with BAA Chain

Use specialized providers for each component, each with its own signed BAA:

User Call -> Twilio (BAA) -> Deepgram STT (BAA) -> Azure OpenAI (BAA) -> ElevenLabs TTS (BAA) -> Audio Out

Advantages: Best-in-class quality at each step. More flexibility to swap components.

Disadvantages: Multiple BAAs to manage and renew. More complex compliance audit surface. Data transits between multiple providers.

Pattern 3: Self-Hosted with Cloud LLM

Host STT and TTS on your own HIPAA-compliant infrastructure, use a BAA-covered cloud LLM:

User Call -> Your SIP -> Self-hosted Whisper STT -> Azure OpenAI (BAA) -> Self-hosted TTS -> Audio Out

Advantages: Audio never leaves your infrastructure for STT/TTS. Reduced BAA surface area. Lower per-minute cost at scale.

Disadvantages: Significant operational burden. You're responsible for STT/TTS model performance, scaling, and security. Latency management is harder.

Pattern 4: Fully On-Premise

Everything runs on your infrastructure, including the LLM:

User Call -> On-premise SIP -> On-premise STT -> On-premise LLM -> On-premise TTS -> Audio Out

Advantages: Maximum data control. No external data transmission. Simplest compliance story (all data stays within your HIPAA-compliant environment).

Disadvantages: Highest operational cost. Quality limitations of self-hosted models. Scaling is expensive. Model updates require manual deployment.

Data Residency Considerations

For healthcare organizations with specific data residency requirements:

US data residency: Most major providers offer US-only data processing. Verify that the specific models and features you're using are available in US regions.
EU data residency (GDPR + HIPAA): If serving European patients, you need both HIPAA and GDPR compliance. Azure, AWS, and GCP all offer EU-based processing regions. Some specialized providers (AssemblyAI, Deepgram) also offer EU endpoints.
No-log and no-retention modes: Several providers offer modes where input data is not logged or retained after processing. This reduces the compliance surface but may limit your ability to debug production issues.

Testing HIPAA Compliance in Voice AI

Architecture and BAAs are necessary but not sufficient. You also need to verify that your voice AI agent actually behaves in a HIPAA-compliant manner during conversations. This is where automated testing becomes critical.

What to Test

PHI handling verification: Does the agent appropriately verify patient identity before sharing PHI? Does it refuse to share PHI with unverified callers? Does it limit PHI disclosure to the minimum necessary?

Compliance phrase verification: Does the agent include required disclosures? ("This call may be recorded for quality purposes." "I can help you with that. First, I need to verify your identity.")

Call recording policy enforcement: Does the agent correctly inform callers about recording? Does it handle opt-out requests? Does it stop recording when requested?

PHI redaction in logs: When conversations are logged or monitored, is PHI properly redacted in non-clinical contexts?

Unauthorized information requests: When a caller asks for information they shouldn't have access to (another patient's records, information beyond the agent's scope), does the agent correctly decline?

Building Compliance Test Scenarios

Create test cases that specifically target HIPAA-relevant behaviors:

Identity verification scenario: The simulated caller provides a name and date of birth. The agent should verify these against the patient record before sharing any appointment details, test results, or medication information. Test the negative case too -- provide incorrect verification details and confirm the agent refuses to share PHI.

Minimum necessary scenario: The caller asks a simple scheduling question. The agent should only share the appointment date, time, and provider name -- not the reason for the visit, diagnosis codes, or insurance details. Test that the agent doesn't volunteer unnecessary PHI.

Third-party caller scenario: Someone calls claiming to be a family member requesting a patient's information. The agent should follow the facility's policy for third-party disclosures (which typically requires the patient's authorization on file).

Recording disclosure scenario: At the start of every call, the agent should inform the caller that the call may be recorded. Use a regex-based metric checking the agent's first message for the required disclosure language.

Automated Compliance Metrics

Build metrics that continuously validate HIPAA-compliant behavior:

Binary LLM-as-a-Judge metric: "Did the agent verify the caller's identity before sharing any protected health information?" Return YES only if explicit verification (name + DOB or other identifier) occurred before any PHI disclosure.
Regex metric (absence mode): Verify the agent never speaks a full SSN, medical record number, or other identifier in a single utterance. Pattern: matches on sequences like \d{3}-\d{2}-\d{4} in agent messages should flag a compliance violation.
Regex metric (first message): Verify the recording disclosure appears in the agent's first message. Pattern: "this call may be recorded" (case-insensitive) in the agent's first turn.
Composite evaluation metric: For each test case, define expected behaviors like "Agent verifies patient identity," "Agent provides only requested information," "Agent does not disclose diagnosis or insurance details." Track the percentage of criteria met.

Coval's metrics framework supports all of these patterns -- binary LLM-as-a-Judge for nuanced compliance evaluation, regex matching in absence mode for prohibited patterns, first-message position checking for required disclosures, and composite evaluation for multi-criteria compliance scoring. Running these metrics on both simulated conversations and production transcripts creates a continuous compliance verification loop.

SOC 2 and GDPR Considerations

While HIPAA is the primary compliance framework for US healthcare voice AI, two related standards frequently come up.

SOC 2

SOC 2 (Service Organization Control 2) evaluates a service provider's controls for security, availability, processing integrity, confidentiality, and privacy. Many healthcare organizations require SOC 2 Type II certification from their vendors in addition to HIPAA compliance.

For voice AI builders: if you're selling to healthcare enterprises, expect to be asked about SOC 2 certification. The audit process typically takes 3-6 months and requires ongoing annual re-certification. Tools like Vanta, Drata, and Secureframe can automate much of the evidence collection.

When selecting voice AI component providers, prefer those with SOC 2 Type II reports. Azure, AWS, GCP, Deepgram (Enterprise), and AssemblyAI (Enterprise) all maintain SOC 2 certifications.

GDPR

For voice AI agents serving European patients (or processing data of EU residents), GDPR adds additional requirements:

Explicit consent for voice recording and processing
Data Processing Agreements (DPAs) with all vendors (equivalent to BAAs)
Right to erasure -- patients can request deletion of their voice data and transcripts
Data portability -- patients can request their conversation data in a machine-readable format
EU data residency -- data must be processed and stored within the EU (or in a country with an adequacy decision) unless specific legal bases apply

The practical implication for voice AI architecture: you need EU endpoints for STT, TTS, and LLM processing, plus the ability to identify and delete specific patient conversations on request.

Vendor Selection Checklist

Before committing to any vendor for a HIPAA-compliant voice AI pipeline, verify:

Vendor will sign a BAA (not just claim HIPAA compliance)
BAA covers the specific products/features you're using (not just the platform generally)
Data is encrypted in transit (TLS 1.2+) and at rest (AES-256)
Vendor doesn't use your data for model training (or offers a binding opt-out)
Data processing region aligns with your residency requirements
Vendor provides audit logs for data access
Vendor has a documented incident response and breach notification process
Vendor maintains SOC 2 Type II certification (if required by your organization)
Vendor supports data deletion requests (for GDPR if applicable)
No-log or no-retention mode is available for PHI-containing requests

FAQ

Does my voice AI agent need to be HIPAA-compliant if it only schedules appointments?

Yes, if it processes any PHI in the course of scheduling. Patient names, phone numbers, dates of birth, provider names associated with specific patients, and appointment types (which can imply conditions) are all PHI. Even a simple scheduling agent typically needs HIPAA compliance if it operates in a healthcare context.

Can I use OpenAI's API for a HIPAA-compliant voice AI agent?

Not the standard OpenAI API (api.openai.com). OpenAI does not offer BAAs for their direct API. However, you can use the same GPT-4o and GPT-4 models through Azure OpenAI Service, which is covered under Microsoft's Azure BAA. The models are identical; the compliance infrastructure is different.

Is ElevenLabs HIPAA-compliant for TTS?

ElevenLabs has stated they offer HIPAA compliance on Enterprise plans with a signed BAA. However, this has evolved over time, so verify the current status directly with ElevenLabs before relying on it. Get the BAA signed before processing any PHI through their system.

What about voice cloning with patient consent?

Voice cloning introduces additional HIPAA considerations. A patient's voice print is biometric data and is considered PHI. If you're cloning voices (e.g., for accessibility purposes), the voice data, the resulting model, and any generated audio all need HIPAA protections. The provider hosting the voice cloning service needs a BAA.

How do I test that my agent actually handles PHI correctly?

Automated conversation simulation with compliance-specific test scenarios and metrics. Create test cases where a simulated caller provides PHI and verify the agent handles it appropriately -- identity verification before disclosure, minimum necessary information sharing, recording disclosures, and refusal to share PHI with unauthorized callers. Run these tests continuously, not just at launch.

What happens if my voice AI agent has a HIPAA breach?

You must notify affected individuals within 60 days. If the breach affects 500+ individuals, you must also notify the HHS Office for Civil Rights and media outlets. Your BAAs with vendors should include breach notification procedures. Penalties range from $100 to $50,000 per violation, with an annual maximum of $1.5 million per violation category. Criminal penalties can also apply for willful neglect.

Building HIPAA-compliant voice AI requires rigor at every layer -- from vendor selection to architecture design to ongoing compliance verification. The providers and patterns exist. The hard part is validating that the system actually behaves correctly in every conversation.

-> Coval helps healthcare voice AI teams automate compliance testing with configurable metrics for PHI handling, disclosure verification, and identity validation. Learn more at coval.dev