What is summarization?

What is summarization?

What is summarization?
Published on
Mar 2024

Summarization in speech-to-text (STT) AI is a popular feature that streamlines the extraction of essential information from spoken content. By condensing lengthy audio recordings or live conversations into concise summaries, STT summarization enhances user experience, facilitating quicker understanding and decision-making for the final users.

The feature leverages the capabilities of both automatic speech recognition (ASR) systems and large language models (LLMs), such as neural networks trained on vast datasets, to produce customized summaries tailored to any use case, such as medical consultation, online meetings and sales calls.

At Gladia, we provide summarization – currently in alpha – as part of our Audio Intelligence API, included in the core product offer. You can sign up for free below or keep on reading to learn more about the feature and its deployment.

How summarization works

Summarization in STT operates through a multi-step process. 

Initially, an  ASR system like Gladia’s Whisper-Zero transcribes the spoken content into textual form, converting audio signals into words. Subsequently, specialized large language models (LLMs) like OpenAI’s GPT-3.5 or Mistral 7B analyze this textual data to identify key phrases, extract important information, and generate summaries based on predetermined criteria. 

The summarization process involves linguistic analysis, machine learning algorithms, and natural language processing techniques to ensure the accuracy and coherence of the summaries.

Different types of summarization

There are various types of summarization techniques, each suited to different product requirements and preferences. The beautiful thing about summarization is just how customizable the feature is thanks to: a) an increasing variety of LLMs to pick from; b) the infinite magic of prompt engineering, enabling every company to find the perfect combination of prompts to produce desired results. 

Companies can choose to deploy and tweak LLMs themselves or go with all-batteries-included audio intelligence APIs like ours. In the latter scenarios, their summarization capabilities will be seamlessly integrated with transcription services. 

Currently, Gladia’s API allows you to access three most common industry-agnostic types of summaries, each catering to specific needs. Here's what they look like in practice:

1. General summary

The general summary provides a comprehensive overview of the transcription, capturing the main points and key details. It serves as a detailed reference for in-depth analysis or review.

2. Concise summary

For quick reference, the concise summary offers a condensed version of the transcription, highlighting only key takeaways. Its goal: efficient information consumption and decision-making.

3. Bullet points

The bullet points summary presents key insights and actionable points in a concise, easily digestible format. It organizes information into bullet-pointed lists, making it ideal for quick reference and strategic planning.

As you can see, with just a few lines of code, you can embed the most common types of summarization into your application. For more information on setting up and using our API, feel free to consult our documentation

If you prefer to build your own summarization from scratch using open-source Whisper and GPT 3.5, here is a dedicated tutorial.

4. Maximizing quality of summaries with prompt engineering

As noted previously, the quality and relevance of summaries depend largely on the prompt provided to LLM. If you want to have full control over the summarization input parameters, here are some factors to consider. 

Prompt engineering involves crafting tailored prompts for specific use cases to optimize the relevance and accuracy of the summaries generated. While high-quality, prompt engineering usually requires at least some specialized expertise, businesses can maximize the quality of summaries by following these actionable insights:

Understand use case requirements

Identify the specific objectives and priorities for summarization within your business context. Whether it's capturing meeting minutes, extracting key insights from customer interactions, or summarizing research findings, align the prompt with the desired outcomes.

Pick the right LLM

Selecting a suitable LLM is crucial for ensuring the quality and relevance of summaries. Consider factors such as language proficiency, domain expertise, model capabilities and price (based on the unique token economics of LLMs) when choosing a model for your summarization needs. 

Then, evaluate different models based on their performance metrics – preferably the ones based on your own internal tests – to assess the compatibility with your use case. 

Customize prompts accordingly

Tailor prompts to suit the linguistic style, vocabulary, and domain-specific terminology relevant to your industry or organization. By incorporating relevant keywords and context cues, you can enhance the summarization process and ensure the output aligns with your expectations. 

To illustrate this, if you’re looking to summarize online meetings, check out our guide with specific examples of prompts for this use case. 

Iterate and refine

It’s normal for early attempts at prompt engineering to not yield the desired results. Continuously evaluate the effectiveness of prompts and summaries based on user feedback and performance metrics. Iterate on prompt variations, adjusting parameters and refining language patterns to improve summarization quality over time.


Summarization is a highly popular feature among final users. Product builders today are presented with an array of open-source and commercial tools for both transcription and summarization to succeed in providing the best summarization experience in their product. 

We’d be happy to accompany you on this mission with one of the fastest and most accurate speech-to-text APIs on the market, including a plug-and-play summarization add-on.

Contact us

Your request has been registered
A problem occurred while submitting the form.

Read more


OpenAI Whisper vs Google Speech-to-Text vs Amazon Transcribe: The ASR rundown

Speech recognition models and APIs are crucial in building apps for various industries, including healthcare, customer service, online meetings, and entertainment.


Best open-source speech-to-text models

Automatic speech recognition, also known as speech-to-text (STT), has been around for some decades, but the advances of the last two decades in both hardware and software, especially for artificial intelligence, made the technology more robust and accessible than ever before.

Case Studies

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.