Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

How to generate meeting summaries and action items with Audio-to-LLM

If you're building a note-taker or meeting assistant product in 2026, generating structured summaries and action items is the first feature you need to get right. It's the core of the value proposition — the reason users open the app after every call.

Product News

Introducing Solaria-3: The most accurate speech-to-text model for European languages

Today we're releasing Solaria-3 – the new #1 among leading speech-to-text providers on business audio and conversational speech, delivering the strongest accuracy on real English customer calls of any model tested. It is our best model to date, which we trained for the audio our customers deal with in real life: calls with background noise, people talking over each other, teams switching between a few languages in one meeting.

Speech-To-Text

Gladia integration recipes: connect calls to your CRM and workflow stack

TL;DR: Connecting call data to CRM and workflow tools requires accurate transcription at the base layer — downstream records are only as reliable as the words captured first. This guide covers four integration paths: Zapier for prototyping, Make.com for visual conditional routing, n8n self-hosted for high-volume privacy-sensitive workloads, and direct REST API for production infrastructure. Gladia's Solaria-1 model benchmarks at an average 29% lower WER and 3x lower DER versus alternatives.

How to generate meeting summaries and action items with Audio-to-LLM

Published on Jun 16
By Emma Genthon
How to generate meeting summaries and action items with Audio-to-LLM

If you're building a note-taker or meeting assistant product in 2026, generating structured summaries and action items is the first feature you need to get right. It's the core of the value proposition — the reason users open the app after every call.

The challenge isn't transcription. Transcription is a solved problem. The challenge is what happens after: taking a raw transcript and turning it into something a salesperson can paste into their CRM, a developer can turn into Linear tickets, or a manager can act on without reading a wall of text. Traditionally, this means running transcription with one service, then piping the output into a separate LLM call, managing two APIs, handling errors across both, and stitching the results back together.

Our Audio-to-LLM feature collapses that into a single API call. In this article you will learn how to use the Audio-to-LLM feature to turn different meetings into structured summaries and action items.

TL;DR

  • Audio-to-LLM lets you transcribe audio and run LLM prompts against the transcript in a single API call, eliminating the need to manage a separate transcription and LLM pipeline
  • You can include multiple prompts in one request and retrieve each result separately, generating a CRM update, Slack digest, and Linear tickets from the same meeting recording simultaneously
  • The output structure is entirely prompt-defined, meaning you can tailor summaries to specific personas: a sales rep gets CRM-ready fields and personal notes, a developer gets ticket-ready descriptions, etc.
  • With 400+ models available per request, you can route different meeting types to different models without changing your pipeline architecture

How Audio-to-LLM works

The mechanic is straightforward. In your Gladia transcription request, you set audio_to_llm to true and pass a config object that specifies two things: the model you want to use, and the prompt (or prompts) you want to run against the transcript.

"audio_to_llm": true,
"audio_to_llm_config": {
"model": "openai/gpt-5.4-nano",
"prompts": [
      <prompt1>, <prompt2>
    ]
  }

A few things worth highlighting for anyone building on top of this:

Multiple prompts, one call: You can include several prompts in the same request and retrieve each result separately. This means you can generate a CRM summary, a Slack digest, and a set of Linear tickets from a single meeting recording without making multiple API calls. For a product with multiple output destinations, this is significant.

400+ models to choose from: You're not locked into a single model. You specify the model per request, which means you can optimize for cost, latency, or output quality depending on the use case — and switch without changing your pipeline architecture.

The output structure follows your prompt: Whatever structure you define in your prompt is what you get back. If your prompt asks for JSON with specific fields, you get JSON with those fields. This makes it easy to feed the output directly into downstream tools without a parsing layer.

Use case 1: Basic meeting summary

The simplest application — and the right place to start — is a general-purpose meeting summary. This is what you'd build first: a feature that works on any meeting, any topic, any team.

For this demo, Emma uses a Star Wars Jedi Council scene as the audio input. It's a deliberately unusual choice that makes a useful point: the feature doesn't need domain-specific tuning to produce a clean, structured output. As long as there's structured dialogue, it works.

The model used here is GPT-5.4 Nano.

The prompt:

"prompts": [
  "Summarize this conversation for a general audience. Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'summary': exactly 2–3 sentences covering what happened and the outcome.

  'key_topics': array of up to 5 strings, the main topics discussed, ordered by how much discussion they received (most-discussed first). Empty array if none.

  'decisions': array of strings describing decisions that were explicitly made or agreed upon. Do NOT count points that were merely discussed, proposed, or left open — those belong in 'open_questions' instead. Empty array if none.

  'action_items': array of objects, each with 'owner' (the speaker's name if stated, otherwise exactly 'unknown') and 'task' (short description). Empty array if none were stated.

  'open_questions': array of strings describing unresolved questions or undecided points. Empty array if none.

  Use only information explicitly stated in the transcript. Do not invent, assume, or infer facts, names, or outcomes."
]

The output:

Jedi council meeting

The gist: Anakin gets named Palpatine's personal rep on the Jedi Council — a big honor that turns sour fast, because the Council won't grant him the rank of Master alongside it. Things get tense with Obi-Wan, and Anakin finds out the real assignment is to quietly report back on what Palpatine's up to, all while the war keeps grinding on.

Topics covered

  • Anakin's appointment to the Council
  • Being denied the rank of Master
  • Friction between Anakin and Obi-Wan
  • The Council's ask to spy on Palpatine
  • War business: reinforcing Kashyyyk + hunting General Grievous

Decisions made

  • Anakin's seat on the Council is confirmed — minus the Master title
  • He's tasked with reporting on Palpatine's dealings
  • Reinforcements head to Kashyyyk for the Wookiees
  • Outlying systems get swept for General Grievous

Left unresolved

  • How Anakin squares Jedi loyalty with spying on his mentor
  • Whether he'll actually let the Master-rank snub go

What the output shows is that the structure you define in the prompt is faithfully reflected in the response. The summary mirrors the prompt's requested format, and action items are extracted and listed cleanly.

What to build with this

For a general note-taker, this is your baseline feature. Every meeting gets a summary. The prompt you use here should be generic enough to handle a standup, a client call, a board meeting, and a hiring interview without breaking. A few things to consider when designing your prompt for this use case:

  • Ask for a summary in a fixed number of sentences or bullet points so the output length is predictable
  • Request action items as a separate field with owner and deadline when specified in the meeting
  • Add a field for key decisions made — distinct from action items, which captures what was resolved, not just what was assigned

Use case 2: Sales call recap

A general summary is useful. A summary formatted exactly for the person reading it is what actually gets used.

For sales teams, the relevant output isn't a prose summary of what was discussed — it's the information they need to update their CRM and prepare for the next conversation. That means specific fields, not free text.

For this example, Emma uses a real cold call sourced from YouTube. The model is again GPT-5.4 Nano, but the prompt is completely different.

The prompt:

"prompts": [
  "You are preparing a sales follow-up from this call transcript. Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'crm_fields': object with these fields (use null for anything not explicitly stated — never guess):
  - contact_name: string or null
  - company: string or null
  - phone: string or null
  - email: string or null
  - role: string or null
  - deal_stage: one of 'discovery', 'qualified', 'proposal', 'closed-won', 'closed-lost', or null if it can't be determined
  - product_or_service_discussed: string or null
  - pain_points: array of strings (empty array if none)
  - budget_or_price_discussed: string or null
  - competitors_mentioned: array of strings (empty array if none)
  - next_step: string or null
  - follow_up_date: string or null
  - sentiment: one of 'positive', 'neutral', 'negative', 'mixed'
  - call_outcome: one of 'resolved', 'upsell', 'churn risk', 'no action', or null if none fit

  'sales_summary': array of exactly 3–5 strings — what happened, what was sold or offered, objections, and resolution.

  'nice_details_to_remember': array of strings — personal/rapport details (family, hobbies, travel, life events) ONLY if explicitly mentioned by name. Empty array if none.

  'objections_and_responses': array of objects with 'objection' and 'how_it_was_handled'. Empty array if none raised.

  'upsell_or_cross_sell_opportunities': array of strings, only ones explicitly raised or clearly implied by stated needs. Empty array if none.

  Attribute statements using diarized speaker labels. Do not invert contact details, personal facts, prices, or outcomes not explicitly stated."
]

The output:

Nissan map update call

The gist: John Smith called about updating the nav map on his 2009 Altima, hesitated on price, and the rep closed the deal with a well-timed discount.

Snapshot

  • Customer: John Smith
  • Product: Map v7.7 (released March 2012)
  • Price: $99 → discounted to $49 (+ shipping & tax)
  • Sentiment: Positive
  • Outcome: Closed-won, paid by Visa, same-day shipping

Objection → Response

  • "It's too expensive" → Rep pointed out it'd been 3 years since his last update, then sweetened it with a limited-time $50 discount

Personal Note: He prefers after hours for meeting time.

No upsell attempts, no competitors mentioned, no personal/rapport details came up on this one — straightforward, efficient close.

The value here isn't just summarization — it's that the output maps directly to fields in a CRM without any reformatting. When Emma shows the output, the structure is immediately recognizable to anyone who's filled in a Salesforce or HubSpot record.

Connecting to your CRM

This is where Audio-to-LLM becomes genuinely powerful for a product builder. Once you have structured output that matches your CRM's data model, you can automate the entire update flow. Two approaches worth considering:

Via an agent: Prompt the model to return JSON that maps to your CRM's field names, then have an agent write those fields after every call. No human in the loop.

Via MCP: If you're building with MCP, you can connect the output directly to a CRM integration and populate records without writing custom API logic for each CRM you want to support.

The personal notes field is worth calling out separately. Small contextual details — a prospect mentioning they're a big soccer fan, that they're traveling next week, that they just switched roles — are exactly the things that get lost between calls and exactly the things that make a follow-up feel human. Prompting explicitly for this field and storing it alongside the structured CRM data is a small addition that delivers outsized value to sales users.

Use case 3: Developer meeting notes

The third use case shifts the audience entirely. Developer teams have their own workflows, their own tools, and their own expectations for what "useful output" means after a meeting. For them, the relevant destination isn't a CRM — it's a ticket in Linear, Jira, or whatever issue tracker the team uses.

For this example, Emma uses a GitLab team's weekly sync, sourced from GitLab Unfiltered. She also switches models here — moving from GPT-5.4 Nano to Mistral Medium 3.5 — to show that model selection is per-request and flexible.

The prompt:

"prompts": [
  "You are a technical PM extracting work items from this meeting transcript for a ticketing system (Jira/Linear/GitHub Issues). Return ONLY a valid JSON object — no markdown code fences, no commentary, no text before or after it. Use exactly these keys:

  'meeting_context': object with:
  - 'purpose': one sentence on why the meeting happened
  - 'participants': array of speaker names or roles from diarization labels, if identifiable; empty array if not
  - 'systems_or_components_discussed': array of strings (e.g. API, mobile app, database)

  'tickets': array of ticket objects, each with:
  - 'title': concise ticket title (max 80 chars)
  - 'type': exactly one of 'bug', 'feature', 'task', 'tech-debt', 'spike'
  - 'priority': exactly one of 'critical', 'high', 'medium', 'low' — infer from urgency language; default to 'medium' if urgency isn't discussed
  - 'description': 2–4 sentences with context from the call
  - 'acceptance_criteria': array of testable criteria; empty array for bugs if only symptoms were given
  - 'labels': array of suggested tags (e.g. backend, frontend, infra, security)
  - 'assignee_suggestion': speaker name or role if explicitly mentioned, else null
  - 'blocked_by': array of blockers explicitly mentioned, empty array if none
  - 'related_ticket_hints': array of references to existing work (ticket IDs, PRs, epics), empty array if none
  - 'source_quote': a short EXACT substring copied from the transcript that justifies this ticket. Use null if no single exact quote captures it — never paraphrase into this field.

  Only create tickets for concrete bugs, features, tasks, or tech debt actually discussed. Do not invent scope. Empty 'tickets' array if none are warranted.

  'technical_decisions': array of objects with 'decision' and 'rationale' — include ONLY decisions explicitly agreed upon or resolved. Points that were merely raised, debated, or left open do NOT count — list those under 'open_technical_questions' instead. Empty array if nothing was finalized.

  'open_technical_questions': array of unresolved engineering questions, including anything discussed but not resolved with a clear decision."
]

The output:

Code review workshop

The gist: Team reviews a merge request swapping the issue analytics feature from REST to GraphQL — no new functionality, just refactoring. The discussion centers on whether the refactor needs new tests to cover gaps that already existed.

Systems touched: Issue analytics • REST API • GraphQL • Projects/groups queries

Ticket raised

  • Add test coverage for GraphQL query variables (tech-debt, medium priority) — reviewers noticed there's no test confirming the correct variables get passed for project-type queries. Risk of silent bugs in the refactor.

Still open

  • Should pure refactors be on the hook for fixing pre-existing test gaps?
  • Are the right GraphQL variables actually being passed for project-type queries?

Connecting to your ticketing tool

The same logic that applies to CRM integration applies here. If your prompt returns a JSON object with fields like title, description, assignee, and labels, you can write those directly to Linear's API (or Jira, or any other ticketing tool) after every engineering meeting.

For a note-taker targeting developer teams specifically, this is the killer feature. Meetings end and tickets exist. No one has to remember to write them up, no one has to argue about what was decided, and the context that always gets lost in translation from conversation to ticket is captured while it's still fresh.

The model switch to Mistral Medium 3.5 is also worth noting practically: different models have different strengths for structured output tasks. If you're building a product where output quality for technical content matters, the ability to route different meeting types to different models without changing your pipeline is a meaningful degree of freedom.

Prompt design: The variable that changes everything

Across all three use cases, the transcription pipeline is identical. The audio goes in, Gladia transcribes it, and the transcript hits the model. What changes the output completely is the prompt.

This is the main design decision you're making as a product builder: what structure do you want for each persona or use case your product serves? A few principles that apply across all three:

Be explicit about output format: If you want JSON, say so and define the schema. If you want markdown with specific headers, define them. The model will follow a clear structure more reliably than it will invent one.

Separate fields by purpose: Don't ask for "a summary" and expect the model to know you also want action items, decisions, and next steps. Ask for each separately.

Tailor the field set to the reader: The sales rep needs next steps and personal notes. The developer needs ticket-ready descriptions. The executive needs decisions and risks. The prompt is where you make the output useful for the specific person reading it — don't use a generic prompt and expect a persona-specific output.

Test with real audio: The Jedi Council scene is a fun demo, but your prompts should be tested on real recordings from the meetings your users actually have. Edge cases — crosstalk, agenda changes mid-meeting, meetings where no decisions are made — surface quickly with real audio.

What to build next

This is the foundation. Once you have summaries and action items working across different personas, the next layer is distribution: getting the output into the right place automatically. That means integrations — CRM, ticketing, Slack, email — and that means connecting your Audio-to-LLM outputs to downstream tools via agents or MCP.

In the meantime, refer to our AI note-taker guide to learn how to build one from scratch.

FAQs

What is Audio-to-LLM?

Audio-to-LLM is a Gladia feature that combines transcription and LLM inference into a single API call. Instead of managing a separate transcription service and LLM pipeline, you submit audio with a prompt and get structured output back in one request.

Can I run multiple prompts against the same recording?

Yes. The prompts array in audio_to_llm_config accepts multiple prompts in one request. A single sales call can simultaneously generate a CRM update, a Slack digest, and Linear tickets without making multiple API calls.

How do I structure the request?

Set audio_to_llm to true, then provide audio_to_llm_config with a model and a prompts array. Define your output schema directly in the prompt — if you want JSON with specific fields, specify them explicitly and instruct the model to return nothing outside the JSON object.

Which models are available?

400+ models, selectable per request via the model field. You can route different meeting types to different models — a fast model for general summaries, a more capable one for technical syncs — without changing your pipeline architecture.

Does it require domain-specific tuning?

No. Audio-to-LLM works on any structured dialogue without fine-tuning. Adaptability is controlled entirely through prompt design. Test your prompts on real recordings from the meeting types your users actually have — edge cases surface quickly.

How do I tailor output for sales vs. developer meetings?

The transcription pipeline is identical — only the prompt changes. For sales, prompt for CRM-ready fields: contact name, deal stage, pain points, next step, objections. For developer meetings, prompt for ticket-ready output: title, type, priority, description, acceptance criteria, assignee. The schema you define is what makes output directly usable without reformatting.

Can I connect the output directly to a CRM or ticketing tool?

Yes. Because the output structure is entirely prompt-defined, you can match your prompt schema to your target tool's data model. Prompt for fields that map to your Salesforce record or Linear ticket schema, then write them via API after each meeting — no human in the loop, no reformatting step.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more