Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Sign up

Build a customer interview library with Gladia, Airtable & Make.com

TL;DR: Most product teams lose qualitative insights to scattered audio and transcripts that misattribute quotes. A reliable interview library needs accurate async diarization, automated routing, and a searchable database. Gladia's Solaria-1 sets the accuracy floor (29% lower WER, 3x lower DER on conversational speech), and Make.com routes its structured JSON into Airtable automatically, turning raw recordings into a searchable, theme-tagged customer content library.

Speech-To-Text

Build an automated sales call analyzer with Gladia and n8n

TL;DR: Off-the-shelf conversation intelligence platforms cost $1,200 to $2,400 per seat per year, while this n8n and Gladia pipeline scales at $0.20 to $0.61 per hour of audio with all features included. The async pipeline handles transcription, speaker diarization, and audio intelligence in a single API call, and the structured JSON output maps directly into HubSpot or Salesforce through n8n nodes. Gladia's Solaria-1 model covers 100+ languages, including 42 that no other API-level competitor supports, protecting CRM data quality for global sales teams.

Speech-To-Text

How to build a no-touch pipeline from sales calls to CRM

TL;DR: Manual CRM entry breaks sales intelligence pipelines because reps skip fields and misremember details, creating corrupted deal data that spreads into forecasts, coaching scores, and follow-up tasks. The bottleneck in fixing this isn't the CRM API or the LLM prompt, it's the transcription layer, since a high word error rate corrupts every entity Claude extracts downstream. This tutorial walks through a production-ready pipeline using Gladia's async STT for transcription, Claude for entity extraction, and n8n for orchestration, with most teams reaching production in under 24 hours. Gladia's Solaria-1 model delivers on average 29% lower WER than alternatives on conversational speech, directly protecting the accuracy of every deal record written to the CRM.

Custom vocabulary for AI meeting note-takers: handling jargon, brand names, and technical terms

Published on May 8, 2026

by Ani Ghazaryan

TL;DR: Injecting custom vocabulary at the ASR layer, not the LLM prompt layer, is the correct fix for entity errors in meeting transcripts. When the transcription layer gets a term wrong, every downstream system inherits the error, corrupting CRM entries, coaching scores, and summaries. Gladia's custom vocabulary feature covers named terms, phonetic variants, and language-scoped entries in a single API payload, included in the base price on Starter and Growth plans.

Your LLM summarization prompt isn't the problem. The reason your meeting notes look sloppy is that your speech-to-text layer is guessing at industry jargon. A transcript that misreads technical terms or brand names hands your downstream LLM a corrupted input it can't recover from, regardless of how well your prompt is written. This guide breaks down why ASR fails on specialized terms and how to implement custom vocabulary at the infrastructure layer using Gladia's async API, covering what that means for the unit economics of your audio pipeline.

Preventing errors in AI's specialized transcripts

The cost of entity recognition errors

A transcription error on a common word is annoying. A transcription error on a named entity is a data integrity failure. When the ASR model misreads a client name or product acronym, that error propagates to every system the transcript feeds: the CRM entry is wrong, the AI summary attributes an action item to the wrong company, and the coaching scorecard scores the wrong outcome.

Our async benchmark methodology demonstrates that Solaria-1 produces 39% fewer errors on key entities compared to leading competitors, and that gap compounds across every meeting your product processes.

Why ASR fails on brand names

Model developers train general-purpose ASR systems on large corpora of conversational speech, news broadcasts, and public audio datasets. The statistical frequency of "Kubernetes," "adalimumab," or "EBITDA" in those datasets is negligible compared to everyday words, so the acoustic model's probability distribution strongly favors common alternatives when it encounters unfamiliar phoneme sequences.

Five concrete failure patterns span every major industry vertical:

Healthcare: "Adalimumab" gets rendered as "add-a-lim-you-mab" or dropped entirely, forcing clinical teams to manually correct transcripts before they're usable.
Finance: "CDO" gets transcribed as "C-D-O" or confused with Chief Data Officer depending on surrounding context.
Technology: "OAuth" surfaces as "oh auth." "kubectl" becomes "kube control" or "cube cuddle" depending on the speaker's accent.
Legal: Latin terms like "res judicata" or "habeas corpus" are phonetically distant from anything in a standard training corpus.
Manufacturing: Process acronyms like "APQP" or "JIT" get split, abbreviated incorrectly, or omitted entirely.

The challenge isn't model quality in isolation. It's that out-of-vocabulary words require explicit lexicon injection to be recognized reliably, and without it, the acoustic model substitutes the closest phonetic match it knows.

Custom vocab's NLP quality boost

When you provide a custom vocabulary list, the ASR engine biases its language model toward those terms at inference time. An n-gram language model estimates the probability distribution over groups of consecutive words, and by modifying that distribution to favor domain-specific terms, the model predicts the injected vocabulary as more likely when the acoustic signal is ambiguous. This approach improves entity accuracy, which means no custom model setup cost and no waiting weeks for a fine-tuning job to complete.

For product teams, this translates directly to downstream data quality. Every named entity injected correctly increases the reliability of your CRM syncs, AI-generated summaries, and NER output. For a deeper look at how this integrates into an async pipeline, see our meeting assistant architecture guide.

Gladia's approach to custom vocabulary

Managing your custom AI vocabulary

We accept custom vocabulary as an array in the transcription request payload. Each item can be a plain string or an object with optional properties including value (the term), intensity (how strongly to bias toward it), pronunciations (alternate phonetic representations), and language. This gives you fine-grained control over how aggressively the model favors a given term.

Here's how to pass a custom vocabulary array to the Gladia async transcription API:

{
  "audio_url": "YOUR_AUDIO_FILE_URL",
  "custom_vocabulary": [
    "Kubernetes",
    {
      "value": "PostgreSQL",
      "intensity": 0.8,
      "language": "en"
    },
    {
      "value": "OAuth",
      "pronunciations": ["oh-auth"],
      "intensity": 0.6
    },
    {
      "value": "Acme Corporation",
      "intensity": 0.7
    }
  ],
  "enable_code_switching""language_config": true,{
    "code_switching": true
  },
  "diarization": true
}

The full parameter reference is in the custom vocabulary documentation.

Handling unique terms and custom glossaries

Solaria-1 applies custom vocabulary entries at inference time. The Solaria-1 architecture uses a multilingual model that handles language detection, which means the intensity bias applied to a custom term works across languages when the speaker switches mid-conversation.

This matters for technical product meetings where participants drop acronyms and brand names regardless of which language they're speaking at the time. You don't need separate vocabulary lists per language. One payload, one request, one billing event.

Integration with async transcription API

Custom vocabulary is part of Gladia's base feature set, not a metered add-on. On Starter ($0.61/hr async) and Growth plans, we bundle custom vocabulary alongside diarization, translation, NER, sentiment analysis, and summarization at the published per-hour rate. You can model your infrastructure costs at 1,000 hours per month or 10,000 hours per month and custom vocabulary incurs no additional charge. We publish the full cost breakdown in our async STT benchmark.

Crafting custom vocabulary lists for AI accuracy

Defining your product's unique jargon

The most effective starting point is an audit of your support tickets and existing transcripts. Run a frequency analysis on words appearing in customer-facing conversations that your current transcription outputs are getting wrong. Look for three categories: client and partner names, product names and version strings, and internal acronyms your team uses in every meeting. Those are the same entities your note-taker is most likely failing on.

Selecting high-value AI glossary terms

Not every uncommon word belongs in your custom vocabulary list. Our best practice guidance is to submit only words the model currently fails to transcribe correctly, not common words it handles fine. Sending redundant entries inflates your list without improving accuracy and increases the risk of false positives, where the model incorrectly biases toward your injected term in contexts where a different word was actually spoken.

Prioritize OOV terms: Proper nouns, trademarked names, and acronyms the model has never encountered in training data. Send individual keywords, not full phrases, because the model handles phrase context natively and individual token injection is more precise.
Avoid inflation: Don't send duplicates or common words the model already handles correctly. Validate your list on representative audio before rolling it out to your full production pipeline.

Formatting brand names and acronyms

Submit the term exactly as you want it to appear in the transcript output. For acronyms, use the spelled-out form if speakers pronounce every letter ("O-Auth") and use the full word form if speakers pronounce it as a word ("OAuth" as "oh-auth" with a pronunciation hint).

Managing code-switching for AI transcripts

Gladia's code-switching support handles mid-conversation language changes across all 100+ supported languages natively, with no separate configuration required. When you combine enable_code_switching:language_config: { code_switching: true } with a custom vocabulary list, the model applies your injected terms regardless of which language the surrounding speech is in. Our multilingual meeting transcription guide covers the accuracy benchmarks and implementation details for teams processing European and Asian language audio.

Prompt engineering for entity recognition

Customizing prompts for entity recognition

Once Solaria-1 delivers a custom-vocabulary-corrected transcript, the next step is structured entity extraction via Gladia's Audio-to-LLM pipeline. The accuracy of your injected terms in the transcript is what makes downstream extraction reliable. A correctly transcribed "PostgreSQL" gives the NER layer a clean token to work with. A hallucinated "Post Grass Q L" means the extraction step is operating on corrupted text, and even a well-tuned NER model will misclassify or miss it entirely.

Designing prompts for custom terms

Here is a concrete prompt pattern that uses the corrected Gladia transcript as input and extracts structured entities for CRM automation:

Extract all brand names, product acronyms, and technical terms from the 
following transcript. Return a JSON object with arrays for: companies, 
products, and technical_terms.

Company names to identify: [Acme Corporation, Oracle, AcmeCorp]
Product acronyms to flag: [OAuth, GDPR, EHR, SOX]
Technical terms: [Kubernetes, microservices, API gateway]

Transcript: "[GLADIA_TRANSCRIPT_OUTPUT]"

Return format:
{
  "companies": [],
  "products": [],
  "technical_terms": []
}

The key principle is prompting the LLM to confirm entities you already know to look for, not discover arbitrary ones. This reduces hallucination risk at the extraction layer because the LLM does pattern matching against a known list rather than open-ended entity discovery.

Achieving reliable AI transcript output

Refining transcripts for brand accuracy

The Before/After contrast for custom vocabulary is straightforward. Without it, a generic ASR output for a SaaS sales call might contain errors on technical terms and brand names. With Gladia custom vocabulary applied, the same audio produces accurate transcription of terms like "Kubernetes," "PostgreSQL," and company names.

The corrected transcript produces a valid CRM entry, a coherent action item, and an NER output the pipeline can act on, while the hallucinated one produces corrupt data that flows silently into every connected system.

Recognizing varied brand pronunciations

Speakers with different native languages pronounce the same brand name differently. Solaria-1 handles accented speech detection natively, and for terms with a consistent non-standard pronunciation across your user base, add that variant to the pronunciations array in your vocabulary payload.

Managing custom vocabulary glossaries

Custom vocabulary lists need maintenance. Review your list quarterly and cross-reference it against entity errors surfacing in your support tickets. For teams processing audio in new markets or launching new product features, update the list before the feature ships, not after the first customer complaint arrives.

Confirming domain-specific term quality

Precision WER for key entities

Standard word error rate measures errors across all words in the transcript. For meeting note-takers, entity-level WER on injected terms is more operationally relevant. Run this calculation by isolating segments of your evaluation audio that contain custom vocabulary terms and measuring the error rate specifically on those segments.

Teams using Gladia's custom vocabulary report meaningful WER reductions on entity-heavy segments. Claap is one example of a team processing multi-language audio through Gladia's pipeline.

Custom vocabulary A/B test setup

A structured proof of concept for stakeholders runs as follows:

Select representative audio: Choose 20-50 calls or meetings containing the terms you intend to inject.
Run two transcription passes on the same audio: one without custom vocabulary and one with your vocabulary list active.
Score entity-level accuracy for each injected term across both outputs.
Map errors to downstream impact: Calculate how many CRM fields, action items, or NER extractions would have failed based on transcript errors in each pass.
Report results by entity category (client names, product acronyms, technical terms) to show where the improvement is concentrated.

This structure gives you a clear ROI case tied to data integrity metrics your engineering and business stakeholders both care about.

Spotting false brand name matches

A high intensity value on a common phoneme sequence can produce false positives where the model substitutes your injected term when a different word was actually spoken. Per the custom vocabulary documentation, start with a moderate intensity value and adjust based on results. Lower it if you see too many false positives in your evaluation audio. Validate on a representative sample before scaling to production.

Resolving AI's brand name and jargon errors

Contextual disambiguation for custom terms

When two custom terms have similar phoneme sequences, the model uses surrounding context to disambiguate. Async transcription benefits from broader conversational context during processing, which is particularly valuable for meeting note-taker use cases where the conversation's topic signals which technical terms are likely to appear.

How AI confuses brand names and common words

The most common confusion patterns follow predictable acoustic logic:

Spoken term	Common ASR substitution	Root cause
Kubernetes	Similar-sounding phrases	Unfamiliar phoneme cluster
PostgreSQL	Word-by-word parsing	Compound word split
OAuth	Phonetic variants	Acronym with ambiguous vowel
Acme Corporation	Phonetic substitutions	Short-vowel substitution
Adalimumab	Syllable-by-syllable parsing	Medical OOV term

‍

For all five patterns, adding the term to your custom vocabulary list with the correct spelling and a phonetic hint where pronunciation is ambiguous resolves the error reliably.

Ensuring accuracy with diverse accents

Solaria-1 supports accented speech across 100+ languages, including 42 languages not covered by other API-level STT providers. For meeting note-takers serving global user bases, this matters because custom vocabulary terms will be pronounced differently by speakers from different regions, and the model needs accent-robust acoustic recognition as the foundation before the vocabulary bias can work correctly.

Here's how Gladia compares to the two most common alternatives on the dimensions that matter most for custom vocabulary workflows:

Capability	Gladia	Deepgram	AssemblyAI
Custom vocabulary inclusion	Bundled at base price	Keyword boosting available	Keyterms Prompting available
Language coverage	100+ languages, 42 unique	40+ languages (Nova-2), 60+ with Nova-3	99 languages
Code-switching	Native, 100+ languages	Supported on Nova-2 and Nova-3	Multilingual support available
Data training policy (paid plans)	Growth/Enterprise: never used for training. Starter: can be used by default.	Data processing agreement available	Contact for data processing agreement
Async WER vs. competitors	Industry-leading performance on conversational speech	Baseline	Baseline

‍

If you're migrating from either provider, our Deepgram migration guide and AssemblyAI migration guide cover the implementation details.

Start with 10 free hours and test custom vocabulary on your own domain-specific audio. See how Solaria-1 handles brand names, acronyms, and accented speakers before committing to a plan.

FAQs

How often should you update your custom vocabulary glossary?

Review quarterly and cross-reference against entity errors in your support tickets. Update immediately when launching new product features or entering new markets where unfamiliar brand names or terminology will appear in audio.

Does custom vocabulary add latency to async transcription?

Custom vocabulary is applied at inference time within the async pipeline. Gladia's async transcription typically processes one hour of audio in approximately one minute.

What intensity value should you start with for new terms?

Per the custom vocabulary documentation, start with a moderate intensity value and adjust upward only if terms aren't being picked up reliably. Lower it if you see false positives on representative audio before deploying to production.

What does Solaria-1 output when it can't match a spoken word to anything in the vocabulary list?

When a spoken word doesn't match anything in the vocabulary list, the model produces the closest phonetic match from its training distribution. Terms not in the custom vocabulary list still appear in the transcript based on the model's standard recognition capabilities.

Key terms glossary

ASR (Automatic Speech Recognition): Technology that converts spoken audio into written text automatically, also called speech-to-text or voice recognition. Modern ASR systems use acoustic models, language models, and pronunciation lexicons working together.

NLP (Natural Language Processing): The field of AI focused on how computers understand and process human language. In ASR pipelines, NLP models handle tasks like entity extraction, sentiment analysis, and summarization on top of transcribed text.

WER (Word Error Rate): The standard metric for ASR accuracy, calculated as the total number of substitutions, deletions, and insertions divided by the total number of words in the reference transcript. Lower WER indicates higher accuracy.

Jargon: Domain-specific vocabulary used within a professional community, including acronyms, brand names, and technical terms that rarely appear in general training corpora and are therefore high-risk for ASR substitution errors.

Custom vocabulary: A list of domain-specific terms injected into the ASR engine at inference time to bias the language model's probability distribution toward those terms. In Gladia's API, this is passed as an array of strings or objects in the transcription request payload.

Contextual embeddings: Numerical representations of words that encode their meaning based on surrounding context. Modern language models use contextual embeddings to disambiguate terms with similar phoneme sequences by analyzing the full conversational context around each word.

Contact us

Your request has been registered

A problem occurred while submitting the form.

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Newsletter

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

Accept

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Read more

Build a customer interview library with Gladia, Airtable & Make.com

Build an automated sales call analyzer with Gladia and n8n

How to build a no-touch pipeline from sales calls to CRM

Custom vocabulary for AI meeting note-takers: handling jargon, brand names, and technical terms

Preventing errors in AI's specialized transcripts

The cost of entity recognition errors

Why ASR fails on brand names

Custom vocab's NLP quality boost

Gladia's approach to custom vocabulary

Managing your custom AI vocabulary

Handling unique terms and custom glossaries

Integration with async transcription API

Crafting custom vocabulary lists for AI accuracy

Defining your product's unique jargon

Selecting high-value AI glossary terms

Formatting brand names and acronyms

Managing code-switching for AI transcripts

Prompt engineering for entity recognition

Customizing prompts for entity recognition

Designing prompts for custom terms

Achieving reliable AI transcript output

Refining transcripts for brand accuracy

Recognizing varied brand pronunciations

Managing custom vocabulary glossaries

Confirming domain-specific term quality

Precision WER for key entities

Custom vocabulary A/B test setup

Spotting false brand name matches

Resolving AI's brand name and jargon errors

Contextual disambiguation for custom terms

How AI confuses brand names and common words

Ensuring accuracy with diverse accents

FAQs

How often should you update your custom vocabulary glossary?

Does custom vocabulary add latency to async transcription?

What intensity value should you start with for new terms?

What does Solaria-1 output when it can't match a spoken word to anything in the vocabulary list?

Key terms glossary

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.