This is where custom vocabulary becomes a game changer. It gives teams the ability to teach their STT tool the words that matter most to their business. Whether it’s a product name, a regional slang term, or a commonly misheard acronym, custom vocabulary helps voice platforms get it right—first time, every time.
For CCaaS providers integrating STT APIs, this feature can deliver a real advantage to your customers: more accurate transcriptions, smarter agent tools, and better data to power analytics and automation.
Key takeaways:
- Custom vocabulary helps speech-to-text engines handle brand names, technical terms, and regional accents more accurately.
- Misrecognition of key terms can impact everything from customer satisfaction to downstream analytics and automation.
- Integrating custom vocabulary is a lightweight but powerful way to improve transcription accuracy across diverse use cases, especially in call centers.
What is custom vocabulary in speech-to-text?
Custom vocabulary is a feature that lets users add specific words and phrases to the speech-to-text engine’s recognition list. These could include unique brand names, product terminology, technical acronyms, or commonly misheard words. Basically, anything the model might not recognize by default.
Unlike full model training, this is a lightweight way to customize output. When you provide a list of custom words, the STT engine prioritizes those terms when it tries to interpret spoken language.
Why custom vocabulary matters
Accuracy is paramount in STT technology. A transcription service that’s constantly making mistakes will lose the trust of users and be replaced by something better.
It’s particularly frustrating for users when the same errors happen over and over, especially where these relate to industry terms, brand names, or other common jargon that they hear every day.
In a call center context, custom vocabulary lets end users define what matters most for their business—and ensures the speech engine captures it correctly. That means fewer transcription errors, better insights, and more reliable results (especially important in specialized industries or multilingual environments).
Real-world scenarios where some STT tools fall short
Brand and product names
STT engines often default to the most commonly used word or phrase, which means unique or stylized brand names can get lost in translation. For example, if a software company is named “Qortex,” the STT might mishear it as “cortex” or “quartets,” leading to confusion in transcripts or CRM logs.
Nike (normally rhymes with “spiky”) could be transcribed as “Nikee” or “Nikeh” depending on accent or clarity, and could easily be mistaken for “night,” “Mike,” or “Nicky.”
With custom vocabulary, the model can learn to recognize and prioritize the correct spelling and meaning.
Accents and pronunciation differences
Speech recognition systems are often trained on "standard" pronunciations—typically American or British English. So when callers with different accents (say, South Asian, Irish, or French) speak, the STT tool may misinterpret common words.
For example, an Irish customer inquiring about “car insurance” could be transcribed as “can assurance.”
By adding frequently misrecognized words to a custom vocabulary, companies can significantly reduce these errors.
Learn more about how speech recognition navigates language here.
Industry-specific language
Call centers in healthcare, finance, or tech often use specialized terms or acronyms that everyday STT engines may not understand. Words like “SaaS,” “EHR,” or “PCI” might be turned into meaningless phrases unless they’re explicitly included in the vocabulary list.
Healthcare is another good example. Users can provide lists of thousands of drug names, which would otherwise typically be missed by transcription tools. Or if a company sells French wines in the United States, the call center’s STT tool can be primed to recognize the wide range of ways that American buyers might pronounce Domaine Pontifical Châteauneuf du Pape or Domaine Boudau Cuvée Henri Boudau.
Custom vocabulary ensures those terms are recognized and transcribed accurately, preserving the context and meaning of conversations.
How Gladia makes custom vocabulary easy
Gladia’s speech-to-text API is designed to make custom vocabulary simple, fast, and effective. Whether you’re uploading a glossary of product names, technical jargon, or regionally specific terms, you can add and update your vocabulary lists dynamically through the API or directly from the dashboard.
Custom vocabulary updates take effect instantly, so your live and post-call transcriptions reflect the latest changes in real time. The system is built to support multilingual, domain-specific, and brand-focused terms, making it an ideal fit for voice platforms serving diverse customer bases across industries.
How the custom vocabulary engine works
Custom vocabulary in Gladia isn’t just a simple search-and-replace tool—it’s a sophisticated, phonetic-aware algorithm that helps improve transcription accuracy without requiring model retraining.
{
"value": "Night's Watch",
"pronunciations": [Nightz Vatch],
"intensity": 0.4,
"language": "de"
}
Example of a custom vocabulary entry with value, pronunciation, intensity, and language fields defined.
The algorithm’s structure has four key variables:
- Value: The word or phrase that will appear in the final transcription. This is case sensitive, and will appear exactly as listed.
- Pronunciations: The different ways a word or phrase might be spoken aloud, especially if they’re commonly misheard by standard STT models. For example, if speakers often say “Q-Bee” but you want the transcript to show “Qbii Technologies,” you’d list “Q-Bee” under Pronunciations and map it to “Qbii Technologies” as the Value. This helps correct errors where the base model (like Whisper) might otherwise mis-transcribe the phrase.
- Intensity: An adjustable setting (from 0 to 1) that determines how sensitive the custom vocabulary algorithm should be. With a default at 0.5, a higher setting will find and replace words more aggressively.
- Language: This helps the algorithm recognize when the same word is pronounced differently in another language. Without it, a French transcript may misrecognize an English-named company, for example.
That’s the basic setup for the tool. Users can then upload or manually add hundreds or thousands of instances to ensure more accurate transcriptions moving forward.
Going deeper, here are some of the key functions that make Gladia’s custom vocabulary algorithm work so successfully:
1. Subset matching for better context
When Gladia receives a transcript, it breaks the text into overlapping chunks or “subsets” of words. These subsets are then compared to the list of custom vocabulary entries.
This makes it possible to match not just individual words but also short phrases that might appear in different forms throughout the conversation.
2. Phonetic normalization with AMIs
To deal with pronunciation variations—especially from accents or uncommon names—Gladia converts both the transcripted words and the custom vocabulary into a phonetic representation called an AMI (Acoustic Model Identifier).
This lets the system detect when something sounds like a word in the custom list, even if it’s spelled differently or misheard by the base STT engine.
3. Multiple pronunciation comparisons
Words can be pronounced in different ways depending on accent, language, or speaker. Gladia’s engine accounts for this by generating multiple phonetic representations for each vocabulary entry.
It compares each version separately, which improves the chances of identifying the correct word even in difficult audio conditions.
4. Smart subset sizing
The size of each comparison subset is based on the length of the longest term in your custom vocabulary list—usually about 1.5 times that length.
This helps ensure that longer terms aren’t missed because the system looked at too small a portion of the transcript.
5. Filtering out noise from common words
Common filler words like “I,” “am,” “uh,” and “you” appear in almost every spoken conversation. To avoid false positives and improve matching precision, Gladia’s algorithm filters these out during the comparison step.
This helps the engine focus on meaningful matches, rather than being distracted by words that appear frequently and carry little value on their own.
6. Similarity scoring
When the system finds multiple possible matches for a segment of text, it uses a similarity score to decide which one is the most likely correct match.
The entry with the highest score is used to replace the original segment, ensuring that the result is as accurate and contextually appropriate as possible.
Use cases for CCaaS and voice platform providers
Custom vocabulary unlocks real value across industries but it’s especially important in CCaaS environments, where accuracy directly impacts agent workflows, customer experience, and downstream analytics. Use cases include:
1. Empowering end users
One of the biggest benefits of custom vocabulary is the ability to let your customers take control. By defining their own vocabulary lists—based on specific departments, product lines, or brand names—companies can dramatically improve the accuracy of transcriptions.
This is especially powerful in call centers with specialized domains like insurance, healthcare, travel, or banking, where industry-specific terms are used daily.
2. Improving agent productivity
When transcription errors are reduced, everything downstream works better. Agents spend less time correcting call summaries or editing CRM notes, and managers get cleaner data for coaching and feedback.
The result is a smoother, faster workflow that helps teams stay focused on conversations, not corrections.
3. Enabling better analytics and AI
Keywords and phrases are the building blocks for analytics features like topic detection, sentiment analysis, and automated QA scoring. If the transcription misses the key terms, the insights suffer.
Custom vocabulary improves data quality at the source, making your analytics smarter, your dashboards more accurate, and your automation more reliable.
Choose smarter STT with custom vocabulary
Custom vocabulary transforms generic speech recognition into a personalized, high-performance transcription engine. For CCaaS and voice platform providers especially, it’s a low-effort, high-impact way to give customers more control, better accuracy, and richer analytics.
Gladia makes it easy to build smarter, more accurate voice features into your product. Book a demo or try the API now to see the difference custom vocabulary can make.