Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Sign up

Get started

Call center transcription software: what enterprises should look for in 2026

TL;DR: Most contact centers evaluate transcription software using clean-audio lab benchmarks, then watch QA automation break down when BPO (Business Process Outsourcing) agents switch languages mid-call or phone-line noise degrades the signal. In 2026, the criteria that matter are real-world multilingual WER, all-inclusive per-hour pricing, and data sovereignty that holds up under GDPR and HIPAA audit. For enterprise teams, the highest-ROI evaluation step is testing on real BPO call samples rather than vendor demo audio, and asking every shortlisted provider for an all-in per-hour price with diarization, sentiment, and entity extraction enabled.

Speech-To-Text

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

TL;DR: Legacy pause-and-resume systems don't remove agents, local desktops, or telephony infrastructure from PCI DSS audit scope. Automated, ingestion-level PII redaction scrubs sensitive data before it reaches any database. By removing cardholder data at the ingestion layer, contact center platforms using automated redaction can potentially reduce audit complexity, cut agent handle time (AHT), and protect downstream CRM and LLM pipelines from corrupt data. The accuracy floor for reliable entity detection in PCI audits is significantly higher than for standard QA transcription, making STT model selection a compliance decision as much as a product one.

Speech-To-Text

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

TL;DR: When your contact center routes voice data through a transcription vendor, every certification gap in that vendor's stack becomes your compliance liability. Voice recordings qualify as personal data under GDPR Article 4, and processing them through uncertified APIs creates direct financial exposure. This guide breaks down what GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS each require of your audio infrastructure vendor and maps those requirements to the QA coverage rates and cost-per-contact metrics you manage daily. We hold GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS certifications, and never use customer audio for model training on Growth or Enterprise plan.

How custom vocabulary improves STT accuracy

Published on June 24, 2025

Even the most advanced speech-to-text (STT) systems can make mistakes, especially when they encounter unfamiliar words like brand names, technical acronyms, or non-standard pronunciations. For call centers and customer service platforms, these missteps aren’t just minor glitches. They can lead to broken workflows, misinterpreted customer needs, and frustrating experiences on both ends of the call.

This is where custom vocabulary becomes a game changer. It gives teams the ability to teach their STT tool the words that matter most to their business. Whether it’s a product name, a regional slang term, or a commonly misheard acronym, custom vocabulary helps voice platforms get it right—first time, every time.

For CCaaS providers integrating STT APIs, this feature can deliver a real advantage to your customers: more accurate transcriptions, smarter agent tools, and better data to power analytics and automation.

Key takeaways:

Custom vocabulary helps speech-to-text engines handle brand names, technical terms, and regional accents more accurately.
Misrecognition of key terms can impact everything from customer satisfaction to downstream analytics and automation.
Integrating custom vocabulary is a lightweight but powerful way to improve transcription accuracy across diverse use cases, especially in call centers.

What is custom vocabulary in speech-to-text?

Custom vocabulary is a feature that lets users add specific words and phrases to the speech-to-text engine’s recognition list. These could include unique brand names, product terminology, technical acronyms, or commonly misheard words. Basically, anything the model might not recognize by default.

Unlike full model training, this is a lightweight way to customize output. When you provide a list of custom words, the STT engine prioritizes those terms when it tries to interpret spoken language.

Why custom vocabulary matters

Accuracy is paramount in STT technology. A transcription service that’s constantly making mistakes will lose the trust of users and be replaced by something better.

It’s particularly frustrating for users when the same errors happen over and over, especially where these relate to industry terms, brand names, or other common jargon that they hear every day.

In a call center context, custom vocabulary lets end users define what matters most for their business—and ensures the speech engine captures it correctly. That means fewer transcription errors, better insights, and more reliable results (especially important in specialized industries or multilingual environments).

Real-world scenarios where some STT tools fall short

Brand and product names

STT engines often default to the most commonly used word or phrase, which means unique or stylized brand names can get lost in translation. For example, if a software company is named “Qortex,” the STT might mishear it as “cortex” or “quartets,” leading to confusion in transcripts or CRM logs.

Nike (normally rhymes with “spiky”) could be transcribed as “Nikee” or “Nikeh” depending on accent or clarity, and could easily be mistaken for “night,” “Mike,” or “Nicky.”

With custom vocabulary, the model can learn to recognize and prioritize the correct spelling and meaning.

Accents and pronunciation differences

Speech recognition systems are often trained on "standard" pronunciations—typically American or British English. So when callers with different accents (say, South Asian, Irish, or French) speak, the STT tool may misinterpret common words.

For example, an Irish customer inquiring about “car insurance” could be transcribed as “can assurance.”

By adding frequently misrecognized words to a custom vocabulary, companies can significantly reduce these errors.

Learn more about how speech recognition navigates language here.

Industry-specific language

Call centers in healthcare, finance, or tech often use specialized terms or acronyms that everyday STT engines may not understand. Words like “SaaS,” “EHR,” or “PCI” might be turned into meaningless phrases unless they’re explicitly included in the vocabulary list.

Healthcare is another good example. Users can provide lists of thousands of drug names, which would otherwise typically be missed by transcription tools. Or if a company sells French wines in the United States, the call center’s STT tool can be primed to recognize the wide range of ways that American buyers might pronounce Domaine Pontifical Châteauneuf du Pape or Domaine Boudau Cuvée Henri Boudau.

Custom vocabulary ensures those terms are recognized and transcribed accurately, preserving the context and meaning of conversations.

How Gladia makes custom vocabulary easy

Gladia’s speech-to-text API is designed to make custom vocabulary simple, fast, and effective. Whether you’re uploading a glossary of product names, technical jargon, or regionally specific terms, you can add and update your vocabulary lists dynamically through the API or directly from the dashboard.

Custom vocabulary updates take effect instantly, so your live and post-call transcriptions reflect the latest changes in real time. The system is built to support multilingual, domain-specific, and brand-focused terms, making it an ideal fit for voice platforms serving diverse customer bases across industries.

How the custom vocabulary engine works

Custom vocabulary in Gladia isn’t just a simple search-and-replace tool—it’s a sophisticated, phonetic-aware algorithm that helps improve transcription accuracy without requiring model retraining.

Example of a custom vocabulary entry with value, pronunciation, intensity, and language fields defined.

The algorithm’s structure has four key variables:

Value: The word or phrase that will appear in the final transcription. This is case sensitive, and will appear exactly as listed.
Pronunciations: The different ways a word or phrase might be spoken aloud, especially if they’re commonly misheard by standard STT models. For example, if speakers often say “Q-Bee” but you want the transcript to show “Qbii Technologies,” you’d list “Q-Bee” under Pronunciations and map it to “Qbii Technologies” as the Value. This helps correct errors where the base model (like Whisper) might otherwise mis-transcribe the phrase.
Intensity: An adjustable setting (from 0 to 1) that determines how sensitive the custom vocabulary algorithm should be. With a default at 0.5, a higher setting will find and replace words more aggressively.
Language: This helps the algorithm recognize when the same word is pronounced differently in another language. Without it, a French transcript may misrecognize an English-named company, for example.

That’s the basic setup for the tool. Users can then upload or manually add hundreds or thousands of instances to ensure more accurate transcriptions moving forward.

Going deeper, here are some of the key functions that make Gladia’s custom vocabulary algorithm work so successfully:

1. Subset matching for better context

When Gladia receives a transcript, it breaks the text into overlapping chunks or “subsets” of words. These subsets are then compared to the list of custom vocabulary entries.

This makes it possible to match not just individual words but also short phrases that might appear in different forms throughout the conversation.

2. Phonetic normalization with AMIs

To deal with pronunciation variations—especially from accents or uncommon names—Gladia converts both the transcripted words and the custom vocabulary into a phonetic representation called an AMI (Acoustic Model Identifier).

This lets the system detect when something sounds like a word in the custom list, even if it’s spelled differently or misheard by the base STT engine.

3. Multiple pronunciation comparisons

Words can be pronounced in different ways depending on accent, language, or speaker. Gladia’s engine accounts for this by generating multiple phonetic representations for each vocabulary entry.

It compares each version separately, which improves the chances of identifying the correct word even in difficult audio conditions.

4. Smart subset sizing

The size of each comparison subset is based on the length of the longest term in your custom vocabulary list—usually about 1.5 times that length.

This helps ensure that longer terms aren’t missed because the system looked at too small a portion of the transcript.

5. Filtering out noise from common words

Common filler words like “I,” “am,” “uh,” and “you” appear in almost every spoken conversation. To avoid false positives and improve matching precision, Gladia’s algorithm filters these out during the comparison step.

This helps the engine focus on meaningful matches, rather than being distracted by words that appear frequently and carry little value on their own.

6. Similarity scoring

When the system finds multiple possible matches for a segment of text, it uses a similarity score to decide which one is the most likely correct match.

The entry with the highest score is used to replace the original segment, ensuring that the result is as accurate and contextually appropriate as possible.

Use cases for CCaaS and voice platform providers

Custom vocabulary unlocks real value across industries but it’s especially important in CCaaS environments, where accuracy directly impacts agent workflows, customer experience, and downstream analytics. Use cases include:

1. Empowering end users

One of the biggest benefits of custom vocabulary is the ability to let your customers take control. By defining their own vocabulary lists—based on specific departments, product lines, or brand names—companies can dramatically improve the accuracy of transcriptions.

This is especially powerful in call centers with specialized domains like insurance, healthcare, travel, or banking, where industry-specific terms are used daily.

2. Improving agent productivity

When transcription errors are reduced, everything downstream works better. Agents spend less time correcting call summaries or editing CRM notes, and managers get cleaner data for coaching and feedback.

The result is a smoother, faster workflow that helps teams stay focused on conversations, not corrections.

3. Enabling better analytics and AI

Keywords and phrases are the building blocks for analytics features like topic detection, sentiment analysis, and automated QA scoring. If the transcription misses the key terms, the insights suffer.

Custom vocabulary improves data quality at the source, making your analytics smarter, your dashboards more accurate, and your automation more reliable.

Choose smarter STT with custom vocabulary

Custom vocabulary transforms generic speech recognition into a personalized, high-performance transcription engine. For CCaaS and voice platform providers especially, it’s a low-effort, high-impact way to give customers more control, better accuracy, and richer analytics.

Gladia makes it easy to build smarter, more accurate voice features into your product. Book a demo or try the API now to see the difference custom vocabulary can make.

Contact us

Your request has been registered

A problem occurred while submitting the form.

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Newsletter

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

Accept

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Read more

Call center transcription software: what enterprises should look for in 2026

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

How custom vocabulary improves STT accuracy

What is custom vocabulary in speech-to-text?

Why custom vocabulary matters

Real-world scenarios where some STT tools fall short

Brand and product names

Accents and pronunciation differences

Industry-specific language

How Gladia makes custom vocabulary easy

How the custom vocabulary engine works

1. Subset matching for better context

2. Phonetic normalization with AMIs

3. Multiple pronunciation comparisons

4. Smart subset sizing

5. Filtering out noise from common words

6. Similarity scoring

Use cases for CCaaS and voice platform providers

1. Empowering end users

2. Improving agent productivity

3. Enabling better analytics and AI

Choose smarter STT with custom vocabulary

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.