API Comparison Table

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Get started

Speech-To-Text

Speech-to-text for AI medical scribes: Why clinical vocabulary breaks generic STT

TL;DR: Generic STT engines fail in clinical environments because language model probability overrides correct acoustic detection of medical terms, substituting phonetically plausible but clinically wrong candidates silently. The result corrupts drug names, dosages, and diagnoses before the LLM ever sees them. Before selecting an STT engine for a medical scribe, verify four things: whether vocabulary biasing works at inference time without fine-tuning, whether async diarization accurately separates clinician and patient audio, whether the model holds up on noisy consultation recordings rather than clean read-speech, and whether the vendor's data training policy covers PHI by default on your plan.

Speech-To-Text

Migrating from self-hosted Whisper to a managed speech-to-text API

TL;DR: Self-hosting Whisper's true cost rarely sits in the model weights. GPU idle time, VRAM leaks under parallel load, and the engineering hours spent maintaining CUDA dependencies and diarization pipelines are where the bill compounds. For teams processing under roughly 3,000 hours per month, assuming 20% of one US FTE at $150K loaded annual cost, a managed API is cheaper, though the break-even shifts materially against your actual labor cost. Above that threshold, the decision depends on your DevOps overhead and whether audio accuracy on real-world recordings matters for downstream systems like CRM sync and coaching scores.

Speech-To-Text

Migrating from AssemblyAI to Gladia: A step-by-step switching guide

TL;DR: Switching from AssemblyAI requires four concrete changes: update one auth header, remap batch endpoints, adjust the JSON response schema, and resample audio for WebSocket connections. Multiple customers independently report completing these in under a day with a rollback abstraction layer in place. The bigger structural difference is cost model: a production stack with diarization, sentiment, entities, and summarization runs $0.30/hr on AssemblyAI's Universal-2 tier because each feature is metered separately, versus a bundled base rate. This guide covers the exact parameter mappings, payload diffs, WebSocket reconfiguration, and a zero-downtime cutover strategy.

How to generate meeting summaries and action items with Audio-to-LLM

Published on Jun 16

By Emma Genthon

How to generate meeting summaries and action items with Audio-to-LLM

If you're building a note-taker or meeting assistant product in 2026, generating structured summaries and action items is the first feature you need to get right. It's the core of the value proposition: the reason users open the app after every call.

The challenge isn't transcription. Transcription is a solved problem. The challenge is what happens after: taking a raw transcript and turning it into something a salesperson can paste into their CRM, a developer can turn into an issue for their ticketing app, or a manager can act on without reading a wall of text. Traditionally, this means running transcription with one service, then piping the output into a separate LLM call, managing two APIs, handling errors across both, and stitching the results back together.

Our Audio-to-LLM feature collapses that into a single API call. In this article you will learn how to use the Audio-to-LLM feature to turn different meetings into structured summaries and action items.

TL;DR

Audio-to-LLM lets you transcribe audio and run LLM prompts against the transcript in a single API call, eliminating the need to manage a separate transcription and LLM pipeline
You can include multiple prompts in one request and retrieve each result separately, generating a CRM update, Slack digest, and Linear tickets from the same meeting recording simultaneously
The output structure is entirely prompt-defined, meaning you can tailor summaries to specific personas: a sales rep gets CRM-ready fields and personal notes, a developer gets ticket-ready descriptions, etc.
With 400+ models available per request, you can route different meeting types to different models without changing your pipeline architecture

How Audio-to-LLM works

The mechanic is straightforward. In your Gladia transcription request, you set audio_to_llm to true and pass a config object that specifies two things: the model you want to use, and the prompt (or prompts) you want to run against the transcript.

"audio_to_llm": true,
"audio_to_llm_config": {
"model": "openai/gpt-5.4-nano",
"prompts": [
      <prompt1>, <prompt2>
    ]
  }

A few things worth highlighting for anyone building on top of this:

Multiple prompts, one call: You can include several prompts in the same request and retrieve each result separately. This means you can generate a CRM summary, a Slack digest, and a set of Linear tickets from a single meeting recording without making multiple API calls. For a product with multiple output destinations, this is significant.

400+ models to choose from: You're not locked into a single model. You specify the model per request, which means you can optimize for cost, latency, or output quality depending on the use case and switch without changing your pipeline architecture.

The output structure follows your prompt: Whatever structure you define in your prompt is what you get back. If your prompt asks for JSON with specific fields, you get JSON with those fields. This makes it easy to feed the output directly into downstream tools without a parsing layer.

Use case 1: Basic meeting summary

The simplest application, .and the right place to start, is a general-purpose meeting summary. This is what you'd build first: a feature that works on any meeting, any topic, any team.

For this demo, Emma uses a Star Wars Jedi Council scene as the audio input. It's a deliberately unusual choice that makes a useful point: the feature doesn't need domain-specific tuning to produce a clean, structured output. As long as there's structured dialogue, it works.

The model used here is GPT-5.4 Nano.

The prompt:

"prompts": [
  "Summarize this conversation for a general audience. Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'summary': exactly 2–3 sentences covering what happened and the outcome.

  'key_topics': array of up to 5 strings, the main topics discussed, ordered by how much discussion they received (most-discussed first). Empty array if none.

  'decisions': array of strings describing decisions that were explicitly made or agreed upon. Do NOT count points that were merely discussed, proposed, or left open — those belong in 'open_questions' instead. Empty array if none.

  'action_items': array of objects, each with 'owner' (the speaker's name if stated, otherwise exactly 'unknown') and 'task' (short description). Empty array if none were stated.

  'open_questions': array of strings describing unresolved questions or undecided points. Empty array if none.

  Use only information explicitly stated in the transcript. Do not invent, assume, or infer facts, names, or outcomes."
]

The output:

Jedi council meeting

The gist: Anakin gets named Palpatine's personal rep on the Jedi Council — a big honor that turns sour fast, because the Council won't grant him the rank of Master alongside it. Things get tense with Obi-Wan, and Anakin finds out the real assignment is to quietly report back on what Palpatine's up to, all while the war keeps grinding on.

Topics covered

Anakin's appointment to the Council
Being denied the rank of Master
Friction between Anakin and Obi-Wan
The Council's ask to spy on Palpatine
War business: reinforcing Kashyyyk + hunting General Grievous

Decisions made

Anakin's seat on the Council is confirmed — minus the Master title
He's tasked with reporting on Palpatine's dealings
Reinforcements head to Kashyyyk for the Wookiees
Outlying systems get swept for General Grievous

Left unresolved

How Anakin squares Jedi loyalty with spying on his mentor
Whether he'll actually let the Master-rank snub go

What the output shows is that the structure you define in the prompt is faithfully reflected in the response. The summary mirrors the prompt's requested format, and action items are extracted and listed cleanly.

What to build with this

For a general note-taker, this is your baseline feature. Every meeting gets a summary. The prompt you use here should be generic enough to handle a standup, a client call, a board meeting, and a hiring interview without breaking. A few things to consider when designing your prompt for this use case:

Ask for a summary in a fixed number of sentences or bullet points so the output length is predictable
Request action items as a separate field with owner and deadline when specified in the meeting
Add a field for key decisions made, distinct from action items, which captures what was resolved, not just what was assigned

Use case 2: Sales call recap

A general summary is useful. A summary formatted exactly for the person reading it is what actually gets used.

For sales teams, the relevant output isn't a prose summary of what was discussed. It's the information they need to update their CRM and prepare for the next conversation. That means specific fields, not free text.

For this example, Emma uses a real cold call sourced from YouTube. The model is again GPT-5.4 Nano, but the prompt is completely different.

The prompt:

"prompts": [
  "You are preparing a sales follow-up from this call transcript. Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'crm_fields': object with these fields (use null for anything not explicitly stated — never guess):
  - contact_name: string or null
  - company: string or null
  - phone: string or null
  - email: string or null
  - role: string or null
  - deal_stage: one of 'discovery', 'qualified', 'proposal', 'closed-won', 'closed-lost', or null if it can't be determined
  - product_or_service_discussed: string or null
  - pain_points: array of strings (empty array if none)
  - budget_or_price_discussed: string or null
  - competitors_mentioned: array of strings (empty array if none)
  - next_step: string or null
  - follow_up_date: string or null
  - sentiment: one of 'positive', 'neutral', 'negative', 'mixed'
  - call_outcome: one of 'resolved', 'upsell', 'churn risk', 'no action', or null if none fit

  'sales_summary': array of exactly 3–5 strings — what happened, what was sold or offered, objections, and resolution.

  'nice_details_to_remember': array of strings — personal/rapport details (family, hobbies, travel, life events) ONLY if explicitly mentioned by name. Empty array if none.

  'objections_and_responses': array of objects with 'objection' and 'how_it_was_handled'. Empty array if none raised.

  'upsell_or_cross_sell_opportunities': array of strings, only ones explicitly raised or clearly implied by stated needs. Empty array if none.

  Attribute statements using diarized speaker labels. Do not invert contact details, personal facts, prices, or outcomes not explicitly stated."
]

The output:

Nissan map update call

The gist: John Smith called about updating the nav map on his 2009 Altima, hesitated on price, and the rep closed the deal with a well-timed discount.

Snapshot

Customer: John Smith
Product: Map v7.7 (released March 2012)
Price: $99 → discounted to $49 (+ shipping & tax)
Sentiment: Positive
Outcome: Closed-won, paid by Visa, same-day shipping

Objection → Response

"It's too expensive" → Rep pointed out it'd been 3 years since his last update, then sweetened it with a limited-time $50 discount

Personal Note: He prefers after hours for meeting time.

No upsell attempts, no competitors mentioned, no personal/rapport details came up on this one: straightforward, efficient close.

The value here isn't just summarization. It's that the output maps directly to fields in a CRM without any reformatting. When Emma shows the output, the structure is immediately recognizable to anyone who's filled in a Salesforce or HubSpot record.

Connecting to your CRM

This is where Audio-to-LLM becomes genuinely powerful for a product builder. Once you have structured output that matches your CRM's data model, you can automate the entire update flow. Two approaches worth considering:

Via an agent: Prompt the model to return JSON that maps to your CRM's field names, then have an agent write those fields after every call. No human in the loop.

Via MCP: If you're building with MCP, you can connect the output directly to a CRM integration and populate records without writing custom API logic for each CRM you want to support.

The personal notes field is worth calling out separately. Small contextual details (a prospect mentioning they're a big soccer fan, that they're traveling next week, that they just switched roles) are exactly the things that get lost between calls and exactly the things that make a follow-up feel human. Prompting explicitly for this field and storing it alongside the structured CRM data is a small addition that delivers outsized value to sales users.

Use case 3: Developer meeting notes

The third use case shifts the audience entirely. Developer teams have their own workflows, their own tools, and their own expectations for what "useful output" means after a meeting. For them, the relevant destination isn't a CRM, it's a ticket in Linear, Jira, or whatever issue tracker the team uses.

For this example, Emma uses a GitLab team's weekly sync, sourced from GitLab Unfiltered. She also switches models here, moving from GPT-5.4 Nano to Mistral Medium 3.5, to show that model selection is per-request and flexible.

The prompt:

"prompts": [
  "You are a technical PM extracting work items from this meeting transcript for a ticketing system (Jira/Linear/GitHub Issues). Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'meeting_context': object with:
  - 'purpose': one sentence on why the meeting happened
  - 'participants': array of speaker names or roles from diarization labels, if identifiable; empty array if not
  - 'systems_or_components_discussed': array of strings (e.g. API, mobile app, database)

  'tickets': array of ticket objects, each with:
  - 'title': concise ticket title (max 80 chars)
  - 'type': exactly one of 'bug', 'feature', 'task', 'tech-debt', 'spike'
  - 'priority': exactly one of 'critical', 'high', 'medium', 'low' — infer from urgency language; default to 'medium' if urgency isn't discussed
  - 'description': 2–4 sentences with context from the call
  - 'acceptance_criteria': array of testable criteria; empty array for bugs if only symptoms were given
  - 'labels': array of suggested tags (e.g. backend, frontend, infra, security)
  - 'assignee_suggestion': speaker name or role if explicitly mentioned, else null
  - 'blocked_by': array of blockers explicitly mentioned, empty array if none
  - 'related_ticket_hints': array of references to existing work (ticket IDs, PRs, epics), empty array if none
  - 'source_quote': a short EXACT substring copied from the transcript that justifies this ticket. Use null if no single exact quote captures it — never paraphrase into this field.

  Only create tickets for concrete bugs, features, tasks, or tech debt actually discussed. Do not invent scope. Empty 'tickets' array if none are warranted.

  'technical_decisions': array of objects with 'decision' and 'rationale' — include ONLY decisions explicitly agreed upon or resolved. Points that were merely raised, debated, or left open do NOT count — list those under 'open_technical_questions' instead. Empty array if nothing was finalized.

  'open_technical_questions': array of unresolved engineering questions, including anything discussed but not resolved with a clear decision."
]

The output:

Code review workshop

The gist: Team reviews a merge request swapping the issue analytics feature from REST to GraphQL — no new functionality, just refactoring. The discussion centers on whether the refactor needs new tests to cover gaps that already existed.

Systems touched: Issue analytics • REST API • GraphQL • Projects/groups queries

Ticket raised

Add test coverage for GraphQL query variables (tech-debt, medium priority) — reviewers noticed there's no test confirming the correct variables get passed for project-type queries. Risk of silent bugs in the refactor.

Still open

Should pure refactors be on the hook for fixing pre-existing test gaps?
Are the right GraphQL variables actually being passed for project-type queries?

Connecting to your ticketing tool

The same logic that applies to CRM integration applies here. If your prompt returns a JSON object with fields like title, description, assignee, and labels, you can write those directly to Linear's API (or Jira, or any other ticketing tool) after every engineering meeting.

For a note-taker targeting developer teams specifically, this is the killer feature. Meetings end and tickets exist. No one has to remember to write them up, no one has to argue about what was decided, and the context that always gets lost in translation from conversation to ticket is captured while it's still fresh.

The model switch to Mistral Medium 3.5 is also worth noting practically: different models have different strengths for structured output tasks. If you're building a product where output quality for technical content matters, the ability to route different meeting types to different models without changing your pipeline is a meaningful degree of freedom.

Prompt design: The variable that changes everything

Across all three use cases, the transcription pipeline is identical. The audio goes in, Gladia transcribes it, and the transcript hits the model. What changes the output completely is the prompt.

This is the main design decision you're making as a product builder: what structure do you want for each persona or use case your product serves? A few principles that apply across all three:

Be explicit about output format: If you want JSON, say so and define the schema. If you want markdown with specific headers, define them. The model will follow a clear structure more reliably than it will invent one.

Separate fields by purpose: Don't ask for "a summary" and expect the model to know you also want action items, decisions, and next steps. Ask for each separately.

Tailor the field set to the reader: The sales rep needs next steps and personal notes. The developer needs ticket-ready descriptions. The executive needs decisions and risks. The prompt is where you make the output useful for the specific person reading it — don't use a generic prompt and expect a persona-specific output.

Test with real audio: The Jedi Council scene is a fun demo, but your prompts should be tested on real recordings from the meetings your users actually have. Edge cases — crosstalk, agenda changes mid-meeting, meetings where no decisions are made — surface quickly with real audio.

What to build next

This is the foundation. Once you have summaries and action items working across different personas, the next layer is distribution: getting the output into the right place automatically. That means integrations (CRM, ticketing, Slack, email) and that means connecting your Audio-to-LLM outputs to downstream tools via agents or MCP.

In the meantime, refer to our AI note-taker guide to learn how to build one from scratch.

FAQs

What is Audio-to-LLM?‍

Audio-to-LLM is a Gladia feature that combines transcription and LLM inference into a single API call. Instead of managing a separate transcription service and LLM pipeline, you submit audio with a prompt and get structured output back in one request.

Can I run multiple prompts against the same recording?‍

Yes. The prompts array in audio_to_llm_config accepts multiple prompts in one request. A single sales call can simultaneously generate a CRM update, a Slack digest, and Linear tickets without making multiple API calls.

How do I structure the request?‍

Set audio_to_llm to true, then provide audio_to_llm_config with a model and a prompts array. Define your output schema directly in the prompt — if you want JSON with specific fields, specify them explicitly and instruct the model to return nothing outside the JSON object.

Which models are available?‍

400+ models, selectable per request via the model field. You can route different meeting types to different models — a fast model for general summaries, a more capable one for technical syncs — without changing your pipeline architecture.

Does it require domain-specific tuning?‍

No. Audio-to-LLM works on any structured dialogue without fine-tuning. Adaptability is controlled entirely through prompt design. Test your prompts on real recordings from the meeting types your users actually have — edge cases surface quickly.

How do I tailor output for sales vs. developer meetings?‍

The transcription pipeline is identical. Only the prompt changes. For sales, prompt for CRM-ready fields: contact name, deal stage, pain points, next step, objections. For developer meetings, prompt for ticket-ready output: title, type, priority, description, acceptance criteria, assignee. The schema you define is what makes output directly usable without reformatting.

Can I connect the output directly to a CRM or ticketing tool?‍

Yes. Because the output structure is entirely prompt-defined, you can match your prompt schema to your target tool's data model. Prompt for fields that map to your Salesforce record or Linear ticket schema, then write them via API after each meeting: no human in the loop, no reformatting step.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Speech-To-Text

Medical speech-to-text for AI scribe builders

Speech-To-Text

Migrating from self-hosted Whisper to a managed speech-to-text API

Speech-To-Text

AssemblyAI to Gladia migration guide: API mapping & setup

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

New model: Solaria-3

Test our real-time and async transcription

2026 Meeting Assistant Report

Read more

Speech-to-text for AI medical scribes: Why clinical vocabulary breaks generic STT

Migrating from self-hosted Whisper to a managed speech-to-text API

Migrating from AssemblyAI to Gladia: A step-by-step switching guide

How to generate meeting summaries and action items with Audio-to-LLM

How Audio-to-LLM works

Use case 1: Basic meeting summary

What to build with this

Use case 2: Sales call recap

Connecting to your CRM

Use case 3: Developer meeting notes

Connecting to your ticketing tool

Prompt design: The variable that changes everything

What to build next

FAQs

What is Audio-to-LLM?‍

Can I run multiple prompts against the same recording?‍

How do I structure the request?‍

Which models are available?‍

Does it require domain-specific tuning?‍

How do I tailor output for sales vs. developer meetings?‍

Can I connect the output directly to a CRM or ticketing tool?‍

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.