Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

Call center transcription software: what enterprises should look for in 2026

TL;DR: Most contact centers evaluate transcription software using clean-audio lab benchmarks, then watch QA automation break down when BPO (Business Process Outsourcing) agents switch languages mid-call or phone-line noise degrades the signal. In 2026, the criteria that matter are real-world multilingual WER, all-inclusive per-hour pricing, and data sovereignty that holds up under GDPR and HIPAA audit. For enterprise teams, the highest-ROI evaluation step is testing on real BPO call samples rather than vendor demo audio, and asking every shortlisted provider for an all-in per-hour price with diarization, sentiment, and entity extraction enabled.

Speech-To-Text

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

TL;DR: Legacy pause-and-resume systems don't remove agents, local desktops, or telephony infrastructure from PCI DSS audit scope. Automated, ingestion-level PII redaction scrubs sensitive data before it reaches any database. By removing cardholder data at the ingestion layer, contact center platforms using automated redaction can potentially reduce audit complexity, cut agent handle time (AHT), and protect downstream CRM and LLM pipelines from corrupt data. The accuracy floor for reliable entity detection in PCI audits is significantly higher than for standard QA transcription, making STT model selection a compliance decision as much as a product one.

Speech-To-Text

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

TL;DR: When your contact center routes voice data through a transcription vendor, every certification gap in that vendor's stack becomes your compliance liability. Voice recordings qualify as personal data under GDPR Article 4, and processing them through uncertified APIs creates direct financial exposure. This guide breaks down what GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS each require of your audio infrastructure vendor and maps those requirements to the QA coverage rates and cost-per-contact metrics you manage daily. We hold GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS certifications, and never use customer audio for model training on Growth or Enterprise plan.

Custom vocabulary for contact center transcription: product names, brands, and agent jargon

Published on July 3, 2026
by Ani Ghazaryan
Custom vocabulary for contact center transcription: product names, brands, and agent jargon

TL;DR: Generic speech-to-text models fail most often on the words that matter most in contact center operations: your product names, brand terms, SKUs, and agent scripts. QA scorecards, CRM records, and coaching workflows break before any LLM sees the transcript because the foundational transcription layer already mangled those critical terms. Custom vocabulary dictionaries solve this at the source by using phoneme-similarity matching to guide transcription toward the correct output. The article covers how phoneme-based matching differs from post-transcription find-and-replace, when to use vocabulary versus spelling correction, and how to build, prioritize, and maintain your domain dictionary through product catalog changes.

When your offshore agents say "AuraSync Pro TX-900" and your transcription engine returns "aura sync protects 900," every downstream system works with wrong data. Your QA scorecard marks non-compliance on product identification, your CRM logs the wrong SKU, and your coaching summary references a product that doesn't exist. This isn't an LLM problem or a prompt engineering problem. Custom vocabulary dictionaries solve it at the source by using phoneme-similarity matching to guide transcription toward the correct output before the transcript reaches any downstream system.

Defining custom vocabulary for call transcripts

Mechanism of custom speech dictionaries

Commercial STT (speech-to-text) APIs typically run a language model during decoding to predict which words most likely follow each other in context. When a speaker says a proprietary term, a generic model may not have encountered that token sequence in training data, so it often picks an acoustically similar common phrase and outputs that instead. That is an out-of-vocabulary (OOV) failure.

Our custom vocabulary implementation converts both the transcribed output and your registered vocabulary entries into phonemes, then scores them for similarity. When phoneme similarity between a transcribed word and a vocabulary entry clears the configured intensity threshold, the engine replaces the output with your registered term. The intensity parameter controls how aggressively replacements fire: higher values widen the phoneme match window, while lower values require a closer acoustic fit before substitution occurs.

This approach differs from post-transcription find-and-replace rules. A find-and-replace rule can only match text already close to the target string, so if the model transcribes "AuraSync" as "aurora sink," the string replacement never fires because neither word appears in the rule. Phoneme-based matching can catch that failure because "aurora sink" and "AuraSync" may share a high phoneme similarity score even though the character strings diverge completely.

Transcription errors in domain terminology

Generic speech-to-text models typically encounter domain-specific terms that fall outside their training vocabulary. The failure modes are predictable: alphanumeric identifiers often get split or misheard, brand names with unusual phonetics map to phonetically similar common words, and acronyms either spell out phonetically or get misheard as plausible-sounding words entirely.

The damage compounds as transcripts move downstream. When an LLM (large language model) receives a transcript containing those errors to generate a QA scorecard, it cannot reconstruct what was actually said at the acoustic layer and confidently scores the call based on the incorrect text. Your QA team then manually overrides the score, invalidating the automation you paid to build. Our guide on call transcription accuracy benchmarks covers how to measure this systematically across your contact center operation, and the factors affecting transcription accuracy piece explains the specific audio conditions that amplify these errors.

Solving transcription errors with custom vocabulary

Capturing specific product and SKU data

Alphanumeric SKUs frequently produce OOV failures in contact center audio. A model trained on general speech has no prior probability for patterns like "TX-900" as a cohesive unit, so decoding treats each character group separately and picks the phonetically nearest common word. Custom vocabulary forces the engine to treat the SKU as a registered phonetic pattern. Register "TX-900" with a pronunciation hint like "tee-ex-nine-hundred" and the phoneme matcher catches the acoustic pattern even when accent or speaking pace causes the base model to diverge initially. For contact centers handling returns, warranty claims, or technical support, the SKU drives your CRM (customer relationship management) record accuracy because it's the primary entity your downstream systems need to populate correctly.

Custom vocabulary for brand entities

Your product intelligence and competitive monitoring dashboards need sub-brand and competitor mentions to transcribe accurately when agents reference them on calls. When brand terms transcribe incorrectly, they often become difficult to track in your analytics, reducing visibility into how often agents mention them or how customers respond.

The table below shows common failure patterns and the downstream QA impact without custom vocabulary active:

What was said Generic STT output Custom-tuned STT output Downstream QA impact
"AuraSync Pro TX-900" "aurora sync pro text 900" "AuraSync Pro TX-900" Incorrect SKU recorded
"ACH payment" "A C H payment" "ACH payment" Payment type unclear
"HIPAA authorization" "HIPPA authorization" "HIPAA authorization" Regulatory term misspelled
"Qbii Technologies" "cubie technologies" "Qbii Technologies" Brand name incorrect
"FCR (first call resolution) score of 87" "F C R score of 87" "FCR score of 87" Metric formatting inconsistent

Resolving abbreviation transcription errors

Industry-specific acronyms can produce transcription inconsistencies: models may spell them out phonetically or mishear the letters entirely. Both patterns can break automated compliance checks that look for specific keyword patterns.

For regulated industries, the impact reaches beyond accuracy. A call requiring the agent to confirm "HIPAA authorization" produces a transcript reading "he papered authorization." The automated compliance check finds no match for "HIPAA" and flags the call as non-compliant. Your QA team manually reviews it, confirms the agent handled the call correctly, and overrides the score, producing a systematic false positive stream that erodes your entire automated QA investment.

Custom vocabulary for agent phrases

BPO (business process outsourcing) operations add another layer of complexity. Offshore agents in the Philippines, India, or Latin America handle calls in English, but their pronunciation patterns for brand-specific terms often diverge from the accent profiles dominant in generic training data. We built Solaria-1 to handle accented speech across 100+ supported languages, benchmarked against 8 providers across 7 datasets and 74+ hours of audio with open and reproducible methodology. Adding custom vocabulary on top of that baseline gives your BPO operation a transcription layer that is both accent-robust and domain-aware.

"Gladia's transcriptions cater well to multilingual requirements, thus significantly aiding our customer support in a complex multilingual setup." - Pratik S. on G2

How to build a custom vocabulary list for your contact center

Step 1: Audit existing call transcripts

Start with your current transcripts, not a blank list. Pull a recent batch of calls and run them through a word frequency analysis against your known product catalog and internal lexicon. Look specifically for calls where QA analysts have manually overridden automated scores, because those overrides mark exactly where transcription failed downstream logic. Calls involving returns, warranty claims, pricing disputes, and compliance confirmations often contain high densities of domain-specific terms and make valuable audit sources.

Step 2: Collect product catalogs and SKUs

Export your active product list, SKU registry, and any competitor terms your agents are trained to track from your CRM or ERP (enterprise resource planning). Filter to active products only, since deprecated SKUs may add noise and increase false positive risk. Include sub-brand names, product line groupings, and any internal codenames that appear in scripted agent language. Refresh this export whenever a product launches, is discontinued, or gets rebranded so the dictionary reflects your current catalog from the first call after any change.

Step 3: Building your vocabulary library

For each term, determine whether the written form and the spoken form diverge enough that the base model needs guidance. The API accepts vocabulary entries as simple strings for straightforward terms or as objects with value, pronunciations, intensity, languagefor terms with language-specific pronunciation patterns. A brand name like "Salesforce" that agents occasionally say as "sell force" or "sale forces" pronunciations may benefit from multiple pronunciation variants, giving the phoneme matcher a wider target to catch the acoustic signal correctly.

Step 4: Prioritize high-frequency terms

Resist adding every term from your product catalog at once. Entries with very low occurrence rates in real call audio may cause the matcher to fire on acoustically similar common words, producing false positives that can be harder to debug than the original OOV errors. Prioritize terms that appear frequently across your call volume plus any term critical for compliance checking regardless of frequency. Start with a focused set of high-priority entries, measure false positive rate against a manual call sample, and expand the list incrementally based on what the data shows.

Managing your custom vocabulary for ongoing accuracy

Sync transcription terms with product

Your product catalog is not static, and your vocabulary dictionary can't be either. Establish a formal handoff between your product team and your transcription operations whenever new SKUs launch or brands change. A practical approach is including the transcription vocabulary update as a line item in your product launch checklist, alongside agent scripts and IVR prompt updates, so the dictionary reflects current products from the first call after launch.

Audit transcription accuracy quarterly

Set a regular review cadence where your QA team samples calls containing custom vocabulary terms and manually checks that the correct forms appear in the output. Measure two things: recall rate (target terms spoken and correctly transcribed) and false positive rate (common words incorrectly replaced by vocabulary entries). If recall is low, consider raising the default_intensity parameter. If false positives are high, consider lowering it or refining the pronunciations list for the offending entries. The custom vocabulary documentation provides guidance on tuning these parameters based on observed performance.

Aligning QA and training on vocabulary

Your QA team and agent training team need to operate from the same vocabulary master list. When agents are trained to say "FCR score" instead of "first call resolution score," your vocabulary list should reflect that scripted form so automated QA checks work correctly against the transcribed output.

For compliance-sensitive operations, review your vocabulary list to ensure it aligns with your data handling policies and applicable jurisdictionrequirements.

When to use custom dictionaries over spelling rules

Transcribing complex product names accurately

We provide two distinct correction mechanisms: custom vocabulary and custom spelling. Understanding which to use for a given failure prevents over-engineering your dictionary and reduces false positives.

Custom spelling handles cases where the base model transcribes the word correctly at the phoneme level but formats it wrong. If the model outputs "salesforce" in lowercase when your CRM requires "Salesforce," custom spelling handles that formatting correction without phoneme comparison. If the model outputs "sale forces" because it genuinely misheard the acoustic signal, custom spelling cannot catch that because "Salesforce" never appears in the transcript to be reformatted. That is a vocabulary problem requiring the phoneme-matching approach.

When to prioritize custom dictionaries

Use custom vocabulary when the failure is acoustic: the engine outputs a different word entirely because it doesn't recognize the phonetic pattern as a valid token. Use custom spelling when the failure is orthographic: the engine recognizes the word but formats or capitalizes it incorrectly.

For example, "sell force" for "Salesforce" is an acoustic failure requiring vocabulary. "salesforce" in lowercase when you need "Salesforce" capitalized is a formatting failure requiring spelling correction. Mapping failures to the right mechanism before building your configuration saves significant debugging time in production.

Combining methods for peak accuracy

The most reliable production setup uses both mechanisms together. Register domain-specific terms and their acoustic variants in your vocabulary list with intensity tuned to each term's collision risk with common words. Use custom spelling to handle formatting consistency for terms the base model recognizes correctly but presents in the wrong form. Together, these two layers cover the full range of transcription errors without requiring model retraining.

Impact of custom dictionaries on transcription WER

Baseline WER without custom vocabulary

WER measures the proportion of words in a transcript that differ from the ground truth, calculated as the sum of substitutions, deletions, and insertions divided by total reference words. On general conversational audio, a well-configured base model produces a stable WER you can plan around. The problem for contact centers is that OOV failures do not distribute evenly across the transcript; they concentrate on exactly the terms your downstream systems depend on most: product names, compliance phrases, alphanumeric SKUs, and brand-specific acronyms. A transcript that is 97% accurate by word count can still have the product SKU wrong on every call, because those few high-value tokens are the ones the base model is least likely to have encountered in training. That concentration effect means your effective error rate on QA-critical content is significantly worse than your headline WER suggests. Before applying custom vocabulary, that gap is invisible in aggregate metrics and only surfaces when QA analysts start logging manual overrides on product identification and compliance phrase checks.

Reducing WER with specific term lists

WER improves when substitution errors on domain-specific tokens are eliminated. Every time the base model outputs "aurora sink" instead of "AuraSync," that counts as one substitution in your WER calculation. Register "AuraSync" with an appropriate pronunciation hint and the phoneme matcher intercepts the substitution before it reaches your transcript. The net effect is that your substitution count on those high-value tokens drops to near-zero, which pulls down your domain-specific WER even when your headline WER across general conversational words stays roughly constant. The practical lever is your vocabulary list's coverage and intensity calibration.

A list that covers 90% of your high-frequency domain terms eliminates 90% of the OOV substitutions driving your QA override queue. Intensity tuning controls precision: too high and you introduce false positive substitutions that add new errors; too low and you miss acoustically distorted variants and leave substitutions uncorrected. Running a vocabulary configuration against a manually reviewed call batch and measuring per-term recall before and after gives you a direct read on WER delta per entry, which tells you exactly which terms are pulling weight and which need pronunciation refinement. A financial services customer running high-volume call processing reported 98.5% numerical accuracy in production after combining base-model transcription with domain-specific vocabulary configuration for numeric identifiers and product codes. Verified production performance context is available on our CCaaS use case page.

Improving QA scorecard accuracy

When your QA automation scores agent compliance on product identification and your transcripts contain the correct product terms, false negative QA flags drop and your coaching interventions target actual agent behavior rather than transcription errors. Agents stop getting coached on problems they don't have, QA analysts spend less time on manual override reviews, and your cost-per-contact reflects the real cost of your operation rather than the overhead of correcting systematic transcription failures.

The ROI runs directly through transcript quality. Contact centers that implement accurate transcription infrastructure can shift QA teams from manually reviewing calls to validating AI findings, a structural shift in QA economics that isn't achievable when your transcription layer produces systematic domain errors.

Deployment guide: adding domain terms to Gladia

Step 1: Define key domain vocabulary

Format your vocabulary entries before making the API call. Start with simple strings for terms where the written and spoken forms are close enough for the phoneme matcher to catch at default_intensity: 0.4default settings. For terms with a significant gap between written and spoken form, build out full entry objects with pronunciationspronunciation arrays. Consider using bare strings for common brand names and full objects for proprietary SKUs and compliance phrases where pronunciation varies from the written form.

Step 2: Programmatic custom dictionary setup

Pass the custom_vocabulary parameter in an async transcription request:

import requests

url = "https://api.gladia.io/v2/pre-recorded"

headers = {
    "x-gladia-key": "YOUR_GLADIA_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "audio_url": "https://your-storage.example.com/call-recording.wav",
    "custom_vocabulary": True,
    "custom_vocabulary_config": {
        "vocabulary": [
            "AuraSync",
            {"value": "TX-900", "pronunciations": ["tee-ex-nine-hundred", "text ninehundred"]},
            {
                "value": "Qbii Technologies",
                "pronunciations": ["Q-Bee Technologies", "Q bee technology"],
                "intensity": 0.5,
                "language": "en"
            }
        ],
        "default_intensity": 0.4
    }
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

The default_intensity applies to all entries without a per-entry override. Per-entry intensity may allow you to tune aggressiveness at the term level, which matters when a SKU phonetically overlaps with common words and needs a tighter threshold than your general vocabulary. Full parameter details are in the custom vocabulary docs.

Step 3: Test vocabulary against live calls

Before deploying to production, run your new vocabulary configuration against a test batch of calls you have already manually reviewed. Compare output transcripts against your baseline and look for improvements in target term accuracy as well as any new false positives where common words were incorrectly replaced. Review any entry showing a high false positive rate before scaling to full call volume.

Step 4: Live transcription integration

Once your test batch clears your quality threshold, deploy to production by adding the custom_vocabulary_config block to your standard async transcription request. Our infrastructure is designed to scale from test volumes to production volumes. Aircall processes over 1 million calls per week through Gladia, and multiple customers report sub-24-hour integration timelines from API connection to production deployment.

Test your custom vocabulary configuration on your own call audio to see how Solaria-1 handles your product names, BPO accents, and domain-specific terms, with async processing at $0.20/hr on Growth and volume pricing on Enterprise.

FAQs

What is an out-of-vocabulary (OOV) failure in contact center transcription?

An OOV failure occurs when a speech-to-text model encounters a word it has no trained probability for, so it typically outputs a phonetically similar common word instead. In contact centers, OOV failures commonly affect proprietary product names, brand-specific acronyms, and alphanumeric SKUs.

How many custom vocabulary entries does Gladia support?

We support custom vocabulary across plans. Contact sales for specific entry limits on Enterprise. Other providers vary: some cap vocabulary files or entries, while others allow thousands of phrases per job depending on language.

Does adding custom vocabulary affect Gladia's transcription latency?

No. Our custom vocabulary implementation is designed to minimize latency impact on both real-time and async processing throughput.

What is the difference between custom vocabulary and custom spelling in Gladia?

Custom vocabulary uses phoneme-similarity matching to catch terms the model acoustically misheard and output a different word entirely. Custom spelling handles orthographic corrections where the model recognized the word correctly but formatted or capitalized it incorrectly.

What intensity value should I start with for a new vocabulary entry?

Start with a moderate intensity value and adjust based on observed performance. If target terms are still missed, consider raising the value. If unrelated common words are being incorrectly replaced, consider lowering intensity for affected entries or refining the pronunciations array. The custom vocabulary documentation provides specific guidance on tuning.

Do Growth and Enterprise plans use customer audio to train Gladia's models?

No. On Growth and Enterprise plans, customer audio is never used for model training and no opt-out action is required. On the Starter plan, customer data can be used for model training by default.

Key terms glossary

Out-of-vocabulary (OOV) error: A transcription failure where the STT model substitutes a phonetically similar common word because it has no trained probability for the target token. OOV errors concentrate on proprietary product names, alphanumeric SKUs, and brand terms.

Word error rate (WER): The percentage of words in a transcript that differ from the ground truth, calculated as the sum of substitutions, deletions, and insertions divided by total reference words. Lower WER means more accurate transcription.

Phoneme: The smallest unit of sound in a language that distinguishes meaning. Custom vocabulary systems use phoneme comparison to match acoustically similar strings to registered vocabulary entries.

Custom vocabulary intensity: A parameter that controls how closely the phoneme pattern of a transcribed word must match a vocabulary entry before substitution fires. Higher values increase recall at the cost of precision, meaning more target terms get caught but false positive risk rises.

False positive (in vocabulary matching): An incorrect substitution where a common word is replaced by a vocabulary entry because their phoneme patterns exceed the intensity threshold. Controlled by lowering intensity or tightening the pronunciations array for the affected entry.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more