Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

Call center transcription software: what enterprises should look for in 2026

TL;DR: Most contact centers evaluate transcription software using clean-audio lab benchmarks, then watch QA automation break down when BPO (Business Process Outsourcing) agents switch languages mid-call or phone-line noise degrades the signal. In 2026, the criteria that matter are real-world multilingual WER, all-inclusive per-hour pricing, and data sovereignty that holds up under GDPR and HIPAA audit. For enterprise teams, the highest-ROI evaluation step is testing on real BPO call samples rather than vendor demo audio, and asking every shortlisted provider for an all-in per-hour price with diarization, sentiment, and entity extraction enabled.

Speech-To-Text

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

TL;DR: Legacy pause-and-resume systems don't remove agents, local desktops, or telephony infrastructure from PCI DSS audit scope. Automated, ingestion-level PII redaction scrubs sensitive data before it reaches any database. By removing cardholder data at the ingestion layer, contact center platforms using automated redaction can potentially reduce audit complexity, cut agent handle time (AHT), and protect downstream CRM and LLM pipelines from corrupt data. The accuracy floor for reliable entity detection in PCI audits is significantly higher than for standard QA transcription, making STT model selection a compliance decision as much as a product one.

Speech-To-Text

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

TL;DR: When your contact center routes voice data through a transcription vendor, every certification gap in that vendor's stack becomes your compliance liability. Voice recordings qualify as personal data under GDPR Article 4, and processing them through uncertified APIs creates direct financial exposure. This guide breaks down what GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS each require of your audio infrastructure vendor and maps those requirements to the QA coverage rates and cost-per-contact metrics you manage daily. We hold GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS certifications, and never use customer audio for model training on Growth or Enterprise plan.

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

Published on July 3, 2026
by Ani Ghazaryan
PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

TL;DR: Legacy pause-and-resume systems don't remove agents, local desktops, or telephony infrastructure from PCI DSS audit scope. Automated, ingestion-level PII redaction scrubs sensitive data before it reaches any database. By removing cardholder data at the ingestion layer, contact center platforms using automated redaction can potentially reduce audit complexity, cut agent handle time (AHT), and protect downstream CRM and LLM pipelines from corrupt data. The accuracy floor for reliable entity detection in PCI audits is significantly higher than for standard QA transcription, making STT model selection a compliance decision as much as a product one.

Contact centers experience high annual agent turnover, which means compliance training is a constant drain. Every new agent who misses a single pause during a payment call is a potential PCI DSS violation. Legacy pause-and-resume systems hand that liability to your agents' manual habits. Automated, ingestion-level redaction takes it off the floor entirely.

The sections below explain how post-call async redaction removes sensitive data from recordings before it reaches any downstream system, how live calls are handled via DTMF suppression and post-call submission, why redacting only the audio or only the transcript fails PCI DSS, and how to mitigate false-negative risk when automated detection is the only scalable path forward.

Why PII redaction is essential for PCI compliance

Call recordings in a contact center environment capture sensitive data constantly, often by accident. An agent taking a payment reads back a card number to confirm. A caller volunteers their date of birth to pass a security check. A billing inquiry surfaces a full Primary Account Number (PAN) mid-conversation before the agent can stop the recording. If you leave any of these moments unredacted, you create a stored liability.

PCI DSS v4.0 prohibits storing Sensitive Authentication Data (SAD), which includes full track data, card security codes (CVV2/CVC2/CID), and PINs, under any circumstances after authorization, including in encrypted form. Organizations must limit SAD storage and maintain documented retention and disposal policies covering data captured before authorization completes.

Three regulatory categories all apply to voice channels, and treating them as interchangeable creates audit gaps:

  • PII redaction: Covers general personal identifiers such as names, addresses, national ID numbers, and phone numbers. Governed by GDPR, CCPA, and similar privacy frameworks.
  • PCI redaction: Specifically targets cardholder data (PAN, expiration dates, CVV) and Sensitive Authentication Data. Governed by PCI DSS v4.0.
  • PHI redaction: Covers protected health information including diagnoses, treatment records, and insurance identifiers. Governed by HIPAA for US healthcare contexts.

Key PCI data fields to mask

The specific fields PCI DSS requires you to protect in voice recordings are:

  • Primary Account Number (PAN): Any sequence of 14 to 19 digits that constitutes a card number. When callers read numbers in groups of four, partial sequences can slip through redaction engines that only look for full 16-digit strings. Configuring detection to flag numeric sequences that match card-number patterns, rather than only complete PANs, addresses this operationally.
  • Card security code (CVV2/CVC2/CID): The three or four-digit verification value.
  • Expiration date: Month and year combination tied to a card.
  • Cardholder name and billing address: Full name as it appears on the card and associated billing address.

PCI DSS hard requirement: CVV storage prohibition

PCI DSS v4.0 strictly prohibits storing the card security code (CVV2/CVC2/CID) under any circumstances, including in encrypted form, after authorization completes. Contact center systems that capture CVV codes in recordings must implement redaction at ingestion before the file is written to storage to maintain compliance.

Key PCI compliance requirements

Use this checklist to audit your current call recording setup against PCI DSS v4.0 obligations:

  • CVV storage prohibition verified: no CVV stored anywhere in the pipeline, including encrypted form.
  • PII, PCI, and PHI identification explicitly configured and active at the ingestion layer, not post-storage.
  • Irreversible redaction confirmed: original unredacted audio and transcript not retained in any production system.
  • Multi-region data residency configured to match your geographic footprint.
  • Documented deletion procedures enforced on paid tiers with zero-retention policies.
  • Audit logs generated at the point of redaction, capturing entity types and timestamps.

Meeting GDPR and regional privacy mandates

GDPR's data minimization principle (Article 5(1)(c)) requires that personal data be "adequate, relevant and limited to what is necessary." Ingestion-level redaction operationalizes this directly: sensitive data that never reaches your database removes the most damaging category of storage liability from your GDPR exposure surface. For operations serving EU customers or running BPO sites in the EU, this principle intersects with PCI DSS scope reduction to produce a compounding compliance benefit from a single architectural decision.

How AI identifies and masks PII in voice streams

Typical redaction pipelines transcribe the audio, identify sensitive entities in the resulting text, and apply masking to both the transcript and the corresponding audio segment before either is written to storage. Applying redaction at ingestion means sensitive values are scrubbed before being persisted, so the original data never enters the storage layer.

Detecting sensitive data in recordings

Batch (async) transcription analyzes the full audio context before producing output, which typically improves entity detection accuracy and speaker attribution. For post-call QA workflows, our async pipeline transcribes one hour of audio quickly, giving you fast turnaround without sacrificing the full-context accuracy that async processing provides. This is the recommended approach for post-call compliance workflows where latency of a few seconds or minutes is acceptable in exchange for higher accuracy.

Entity detection uses two complementary methods:

  1. Named Entity Recognition (NER): A machine learning classifier that identifies entity types (PERSON, CREDIT_CARD_NUMBER, PHONE_NUMBER, ADDRESS) based on linguistic context. NER typically makes predictions based on learned patterns rather than fixed rules.
  2. Pattern matching (regex): Deterministic rules that flag any string matching a defined format, regardless of linguistic context. Regex produces consistent results for structured data like card numbers and national IDs.

Custom regex patterns extend coverage beyond standard PII classes. Regional formats NER models may not catch natively include UK National Insurance numbers (two letters, six digits, one final letter that must be A, B, C, or D), French INSEE numbers (15 digits), and internal customer reference IDs specific to your CRM schema.

Live calls: DTMF suppression and post-call async redaction

For live calls, the spoken channel is handled via post-call async redaction: once the call ends, the recording is submitted to our async pipeline, which transcribes and redacts both the audio and transcript before either is written to your storage or downstream systems. PII redaction is available on the pre-recorded transcription endpoint only, it is not available on the real-time streaming endpoint.

For DTMF (touch-tone) payment flows, compliant solutions typically mute DTMF tones in the audio stream and route them directly to the payment processor without passing through the recording system. AI-based speech redaction handles the spoken channel while DTMF suppression handles the keypad channel. Both are required for full coverage.

ML-based redaction features may not identify and remove all instances of sensitive data in transcripts, and vendors often recommend reviewing output for completeness. This is not a weakness unique to one provider. It is the operational reality of ML-based detection, and it shapes the mitigation strategy: pair high-accuracy STT with custom regex, and maintain audit logs to prove due diligence.

Comparing audio redaction and transcript masking

Redacting only the transcript while leaving the original audio intact fails PCI DSS, and redacting only the audio while leaving card numbers in the transcript also fails, because both layers are in scope for PCI DSS auditors and both feed downstream systems that will propagate unmasked data further.

Audio redaction: silence and replacement

Audio redaction modifies the raw audio file by replacing sensitive spoken segments with a silence block. The replacement is applied at the precise timestamp identified by the STT engine, and the redacted audio file replaces the original in storage. For async workflows, our speaker diarization provides word-level timestamps, giving the redaction layer the precise start and end points needed to apply clean replacements without cutting surrounding speech.

Transcript tokenization and why both audio and transcript must be redacted

Transcript masking replaces sensitive strings with standardized tokens. A card number spoken as sixteen digits becomes [CREDIT_CARD_NUMBER]a placeholder token in the transcript. A cardholder name becomes [PERSON]a name placeholder. These tokens are designed to preserve the structural readability of the transcript for QA scoring without exposing the underlying data.

The dual-layer requirement also protects downstream systems. If only the audio is redacted but the transcript contains card numbers, any LLM pipeline processing transcripts for summaries or coaching scores will ingest and potentially store the sensitive data. Redacting both layers at ingestion is designed to protect downstream exposure points, including CRM webhooks and QA scoring platforms. When evaluating vendors, confirm in writing whether source audio is retained post-redaction, where it is stored, and how long until deletion, because any retention window brings that storage location back into PCI scope; review deletion configuration and data retention limits as part of your vendor assessment.

Why automated redaction outperforms manual

Table 1: Redaction method comparison

Method Speed Audit scope Accuracy Implementation effort
Manual (pause-and-resume) Agent-dependent, may add handling time SAQ-D with extensive controls Error rate may increase with agent turnover May require agent training and telephony configuration
Automated batch (async) Fast processing per hour of audio Potentially reduced scope Consistent, model-dependent API integration, often sub-24 hours
Automated real-time Low latency per chunk Not applicable for PII redaction (async-only) N/A — PII redaction is async-only API integration, often sub-24 hours

PII redaction is available on the pre-recorded (async) endpoint only. Real-time streaming does not support entity masking at this time.

When to use manual call redaction

Manual redaction may be viable for operations processing very low call volumes where each call has extremely high business value and is reviewed individually by a trained compliance officer. At higher scale, manual compliance creates a dependency on agent discipline that contact center attrition makes unsustainable. Every agent who leaves requires a replacement who needs a ramp period, during which error risk may be elevated.

Scaling compliance with automated PII tools

Organizations using pause-and-resume must complete SAQ-D, which requires controls across all 12 PCI DSS requirement domains, because the contact center remains in PCI scope even when recording is paused. Automated ingestion-level redaction that prevents sensitive data from reaching any local system may reduce audit scope to a smaller footprint. That potential reduction in audit preparation time and annual compliance cost is a primary financial driver for migration, not just the AHT benefits.

Pause-and-resume protects the recording itself but does not safeguard other systems that process or transmit cardholder data: agent desktops, screen recordings, VoIP infrastructure, and internal networks all remain in scope. PCI DSS applies to systems that store, process, or transmit cardholder data, so the contact center stays in scope even when the recording is paused.

Operational cost, AHT, and agent impact

As an illustrative example, a 500-seat contact center processing 50,000 calls per month with a 10% manual QA sampling rate (5,000 calls reviewed) would require manual reviewers to spend many minutes per call on each interaction, making manual redaction at scale operationally challenging. Automated redaction handles high volumes at consistent API cost with no additional headcount required for the redaction process itself.

For a 500-seat contact center, manual pause-and-resume compliance checks can add meaningful cost at scale by extending handle time on every call. Removing the compliance burden from agents eliminates this procedural interruption, reduces cognitive load, and cuts the error risk that forces compliance reviews after the fact. Automated payment flows also reduce AHT by cutting two procedures (recording, reading back, keying in) down to one.

High numerical accuracy in STT models directly reduces false-negative risk in PCI redaction pipelines. If the transcription layer produces a wrong digit sequence because the model misheard an accented speaker, the entity detector has no chance of flagging it as a card number. For contact centers processing payments, WER is not just a quality metric for meeting notes. It is a compliance risk metric, and audio conditions such as phone compression, background noise, and accent density are what most frequently drive missed entity detections in production.

Mitigating false negatives in PII detection

The core objection from compliance officers considering AI redaction is straightforward: "If the AI misses a single card number, we fail our audit." That concern is legitimate. No AI redaction system guarantees zero false negatives, which is why the compliance posture must be built in layers:

  • High-accuracy STT as the foundation, because detection can't work on a garbled transcript.
  • NER covering all required entity types for your regulatory frameworks.
  • Regex rules covering regional and structured formats NER may miss.
  • Audit logs proving redaction occurred at ingestion, with timestamps and entity types flagged, to demonstrate due diligence even if a rare edge case slips through.

Mature redaction implementations combine gateway-level detection for structured PII (card numbers, national IDs) and application-layer detection for context-dependent cases. The two approaches complement each other rather than compete.

Redacting PII from diverse dialects

Accent robustness is directly tied to false-negative risk in multilingual BPO operations. An STT model that struggles with Philippine English, South Asian accents, or Latin American Spanish will produce transcription errors in exactly the contexts where sensitive data is most likely to be spoken. Solaria-1covers 100+ supported languages with accent handling benchmarked across real conversational audio.

Numerical accuracy is the category that matters most for PCI redaction. The blind STT comparison video shows how Solaria-1 performs against alternatives on real audio without benchmark selection bias.

Handling phonetic and code-switched PII

Edge cases in voice redaction include callers who spell out card numbers phonetically or dictate digits with filler words between them. High-accuracy STT models trained on real conversational audio handle these phonetic patterns more reliably because the training data includes the filler and confirmation language real callers use. For code-switching scenarios, our code-switching detection handles mid-conversation language changes without requiring a session restart.

Building compliance audit logs

Compliance logs should capture the timestamp of the redaction event, the entity types identified and masked, the file or session identifier, and ideally note that the original unredacted content was not retained. These logs give your internal QA team a record to review when investigating a potential breach and give your PCI assessor evidence that redaction occurred at the ingestion layer, supported by deletion API endpoints for zero-retention configuration on Growth and Enterprise plans.

Implementation and configuration reference

Table 2: Compliance posture comparison AWS Transcribe and Gladia

Capability AWS Transcribe Gladia (Growth/Enterprise)
HIPAA de-identification Explicitly disclaims: "does not meet requirements" SOC 2 Type II, ISO, HIPAA, and GDPR compliant
Customer data used for model training Yes, by default on standard tier Never on paid plans, no opt-out required
Diarization, NER, sentiment included in base price Included in standard pricing Included at base rate
PCI-relevant numeric accuracy Not specified for card data High accuracy reported in production use

AWS's own documentation explicitly states its redaction feature "does not meet the requirements for de-identification under medical privacy laws, such as HIPAA." For contact centers in financial services or healthcare, that disclaimer is a compliance gap an assessor will flag.

We are SOC 2 Type II, ISO, HIPAA, and GDPR compliant, with configurable redaction at the ingestion layer. PII redaction must be explicitly enabled in your API request and is not active by default. You specify which entity types to redact (credit card numbers, names, phone numbers, addresses) in the configuration, and the API returns redacted tokens with corresponding timestamps. The pricing structure bundles entity detection and redaction into the base rate with volume-based pricing options.

Transcript token format and async pipeline configuration

Our API identifies and flags entities in the transcript response, returning redacted tokens with the corresponding timestamps of the original sensitive segment. The token format [CREDIT_CARD_NUMBER][PERSON][PHONE_NUMBER]appears in the JSON (JavaScript Object Notation) response alongside the transcript, so your downstream CRM webhook or QA scoring system receives already-masked content without requiring any additional processing step. Full configuration details are in the Gladia PII redaction documentation.

For live payment calls, post-call async submission is the supported mechanism for PII redaction. Once the call recording is available, submitting it to the async endpoint returns a fully redacted transcript and audio file before either reaches your CRM, QA platform, or LLM pipeline. Real-time streaming via WebSocket is available for transcription but does not support entity masking. The playground walkthrough shows how to test the pipeline on your own audio before committing to a full integration.

Syncing redaction with call platforms

Native integrations with telephony platforms cover the telephony layer. For teams moving from Deepgram or AssemblyAI, we provide migration guides from Deepgram and migration guides from AssemblyAI that map existing API calls to our API schema and address data governance and legal compliancefor regulated contact center environments.

Solaria-1 achieves on average 29% lower WER than alternatives on conversational speech, benchmarked across 8 providers, 7 datasets, and 74+ hours of audio. Lower WER on accented conversational audio can reduce missed entity detections in a multilingual BPO environment.

Get started on our Starter planand test how our API handles accented speech, code-switching, and multilingual audio on your own BPO recordings before committing to a plan.

FAQs

What counts as PII under PCI DSS?

PCI DSS Sensitive Authentication Data (SAD) includes card security codes (CVV2/CVC2/CID), full magnetic stripe data, and PINs, all of which are prohibited from storage after authorization. Cardholder data such as PAN, expiration dates, and cardholder names may be stored if required for business purposes, provided they are protected in accordance with PCI DSS requirements. However, SAD must never be stored in recordings or any other system after authorization completes.

Do I need to redact audio, transcript, or both?

Both layers should be redacted. Redacting only the transcript while leaving the original audio intact leaves the audio recording in scope for PCI DSS auditors, and redacting only the audio while leaving card numbers in the transcript exposes downstream CRM and LLM systems that process the transcript feed.

What is the liability risk if the AI misses a card number?

Redaction failure resulting in stored cardholder data creates PCI DSS non-compliance and potential fines. Mitigation requires pairing high-accuracy STT with configurable regex rules and maintaining audit logs that prove redaction ran at ingestion, demonstrating due diligence to assessors even in rare edge cases.

Does automated redaction replace manual PCI pause-and-resume?

Yes, for the majority of contact center operations. Pause-and-resume keeps agents and local infrastructure in PCI scope, requiring SAQ-D with extensive security controls. Ingestion-level automated redaction prevents sensitive data from reaching any local system, potentially reducing your audit footprint.

How long does it take to integrate Gladia's redaction into an existing telephony stack?

Most integrations go from first API call to production in under a day, supported by our migration guide for Deepgram and migration guide for AssemblyAI.

Key terms glossary

Primary Account Number (PAN): The 14–19 digit sequence that identifies a payment card account. PCI DSS treats the PAN as the core unit of cardholder data: any system that stores, processes, or transmits a PAN is in scope for PCI DSS audit, regardless of whether the number is encrypted.

Sensitive Authentication Data (SAD): A PCI DSS category covering data elements that must never be stored after authorization completes, under any circumstances. SAD includes card security codes (CVV2/CVC2/CID), full magnetic stripe or chip data, and PINs. Unlike PAN, no business justification permits SAD retention.

Named Entity Recognition (NER): A machine learning classification task that identifies and labels spans of text as structured entity types (PERSON, CREDIT_CARD_NUMBER, PHONE_NUMBER, ADDRESS) based on linguistic context. In redaction pipelines, NER handles context-dependent detection that fixed regex patterns cannot cover.

Word Error Rate (WER): The standard metric for STT accuracy, calculated as the number of word-level substitutions, deletions, and insertions divided by the total number of reference words. In PCI redaction pipelines, WER directly determines false-negative risk: a transcription error on a digit sequence means the entity detector has no chance of flagging it as a card number.

Ingestion-level redaction: A pipeline architecture in which sensitive data is identified and masked before the audio or transcript is written to any storage system. Contrasts with post-storage redaction, where the original data is written first and scrubbed later, leaving a window that keeps the storage location in PCI DSS scope.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more