Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Read more

Speech-To-Text

Build a customer interview library with Gladia, Airtable & Make.com

TL;DR: Most product teams lose qualitative insights to scattered audio and transcripts that misattribute quotes. A reliable interview library needs accurate async diarization, automated routing, and a searchable database. Gladia's Solaria-1 sets the accuracy floor (29% lower WER, 3x lower DER on conversational speech), and Make.com routes its structured JSON into Airtable automatically, turning raw recordings into a searchable, theme-tagged customer content library.

Speech-To-Text

Build an automated sales call analyzer with Gladia and n8n

TL;DR: Off-the-shelf conversation intelligence platforms cost $1,200 to $2,400 per seat per year, while this n8n and Gladia pipeline scales at $0.20 to $0.61 per hour of audio with all features included. The async pipeline handles transcription, speaker diarization, and audio intelligence in a single API call, and the structured JSON output maps directly into HubSpot or Salesforce through n8n nodes. Gladia's Solaria-1 model covers 100+ languages, including 42 that no other API-level competitor supports, protecting CRM data quality for global sales teams.

Speech-To-Text

How to build a no-touch pipeline from sales calls to CRM

TL;DR: Manual CRM entry breaks sales intelligence pipelines because reps skip fields and misremember details, creating corrupted deal data that spreads into forecasts, coaching scores, and follow-up tasks. The bottleneck in fixing this isn't the CRM API or the LLM prompt, it's the transcription layer, since a high word error rate corrupts every entity Claude extracts downstream. This tutorial walks through a production-ready pipeline using Gladia's async STT for transcription, Claude for entity extraction, and n8n for orchestration, with most teams reaching production in under 24 hours. Gladia's Solaria-1 model delivers on average 29% lower WER than alternatives on conversational speech, directly protecting the accuracy of every deal record written to the CRM.

Build a customer interview library with Gladia, Airtable & Make.com

Published on May 15, 2026
by Ani Ghazaryan
Build a customer interview library with Gladia, Airtable & Make.com

TL;DR: Most product teams lose qualitative insights to scattered audio and transcripts that misattribute quotes. A reliable interview library needs accurate async diarization, automated routing, and a searchable database. Gladia's Solaria-1 sets the accuracy floor (29% lower WER, 3x lower DER on conversational speech), and Make.com routes its structured JSON into Airtable automatically, turning raw recordings into a searchable, theme-tagged customer content library.

Most teams building a customer interview library obsess over database schemas and automation logic while their qualitative research sits locked in raw audio files. The bottleneck for researchers, UX teams, and product organizations building a customer interview library is rarely the Airtable base or the automation logic. It's the transcription layer.

A transcript that hallucinates a feature name or misattributes a customer quote to the interviewer corrupts every downstream record. This guide covers how to connect Gladia's diarized async transcription to Airtable via Make.com. The result: raw interview recordings become a searchable, theme-tagged, sentiment-scored Customer Content Library your entire product organization can query in seconds.

Why product teams need a searchable interview library

Product teams rely on qualitative research (customer interviews, usability sessions, discovery calls) to decide what to build and why. When that research lives in scattered Zoom recordings, personal Notion notes, and Google Docs shared by a single researcher, the organization can't act on it at scale.

You're not building this for storage alone. You're making it possible for a product manager to run a query on Tuesday morning and pull every quote about "onboarding friction" from enterprise customers in the past 90 days, without asking anyone. That's what a Customer Content Library actually delivers, and it requires four things to be useful:

  • Searchable: Every quote retrievable by keyword, theme, customer segment, or date range.
  • Accurately attributed: Speaker labels tied to the customer, not the interviewer. Wrong attribution makes quotes unusable for research.
  • Automatically populated: New interview recordings flow into the database without manual entry. Human-in-the-loop is reserved for validation, not data entry.
  • Structured for downstream use: Themes, sentiment scores, and named entities exist as discrete fields, not buried in free-text transcripts.

Manual transcription and tagging fails all four criteria. By the time a transcript is ready, the interview is already a week old. Accuracy problems compound this further.

Activating your Gladia & Airtable pipeline

We kept the architecture deliberately simple. Three tools, one linear data flow, no custom backend required.

Integration flow: Gladia, Airtable

The pipeline moves in four stages. An audio file lands in cloud storage. Make.com detects it and sends the file URL to Gladia's async transcription API with diarization enabled. Gladia processes the audio and returns a structured JSON payload. Make.com parses that JSON and writes discrete records to Airtable.

Make.com integrates over 3,000 apps and includes 13 dedicated Airtable modules covering record creation, updates, upserts, and search. No custom code bridges the two systems, and all data mapping happens inside Make.com's visual scenario builder. Make.com populates Airtable records automatically. Airtable does not transcribe audio files natively.

If your team prefers a no-code alternative, Gladia's documentation covers integration patterns that follow similar logical flows through various automation platforms.

Diarization: ensuring accurate customer quotes

Diarization identifies who spoke each segment of an audio recording. Without it, a transcript is a wall of text with no speaker attribution, and every quote carries attribution risk. For qualitative research, that ambiguity destroys the data's usefulness.

Gladia's async diarization is powered by pyannoteAI's Precision-2 model. Speaker attribution can be handled in post-processing for higher accuracy. The async benchmark shows on average 3x lower DER compared to alternatives, which directly reduces attribution risk in production workflows.

The JSON output structures each utterance with a speaker label, timestamps, and transcribed text. Here's what the output looks like for a two-speaker interview:

// Speaker 0 = Interviewer, Speaker 1 = Customer
{
  "utterances": [
    {
      "text": "Can you walk me through the moment you decided to look for an alternative?",
      "speaker": 0,
      "start": 12.4,
      "end": 17.8,
      "language": "en"
    },
    {
      "text": "It was after the third time our integration broke during a live demo.",
      "speaker": 1,
      "start": 18.1,
      "end": 23.6,
      "language": "en"
    }
  ]
}

Make.com filters on speaker: 1 to write only customer utterances to Airtable, keeping every interviewer question out of the quote library entirely.

Auto-tagging interview quotes

Gladia's audio-to-LLM pipeline produces structured outputs from the transcript. All of the following are included in the base rate on Starter and Growth plans, no add-on fees required. The full audio intelligence layer covers:

  • Named entity recognition (NER): Extracts product names, competitors, pricing figures, and role titles from the transcript automatically.
  • Text-based sentiment analysis: Classifies each utterance as positive, negative, or neutral based on transcript content.
  • Summarization: Produces a session-level summary useful as the record description in Airtable, though quality is ceiling-bounded by transcript accuracy.
  • Chapterization: Segments the interview into labeled topics that map directly to Airtable theme tags.

Your transcript accuracy sets the ceiling for every downstream system. A quote pulled from a transcript with high WER is not a reliable source for a roadmap decision, which is why the Solaria-1 accuracy advantage matters at the infrastructure level before data reaches Airtable.

Connect Gladia & Airtable using Make.com

1. Configure recording upload trigger

Set the first module to watch for new files in your recording storage:

  • Google Drive > Watch Files in a Folder for teams using Drive-based recording workflows.
  • Dropbox > Watch Files for teams using Dropbox as a recording destination.
  • HTTP > Custom Webhook for meeting platforms that POST recording URLs on session completion.

The trigger fires once per new file and passes the file URL downstream to the next module.

2. Configure Gladia for diarized transcripts

The second module sends a POST request to Gladia's async transcription endpoint. Use Make.com's HTTP > Make a Request module with the following configuration:

  • URL: https://api.gladia.io/v2/transcription
  • Method: POST
  • Headers: x-gladia-key: YOUR_API_KEY
  • Body (JSON):
{
  "audio_url": "{{1.url}}",
  "diarization": true,
  "sentiment_analysis": true,
  "named_entity_recognition": true,
  "summarization": true
}

Gladia processes audio files up to 135 minutes and 1000MB in a single request, covering the vast majority of customer interview recordings. The API accepts WAV, M4A, FLAC, AAC, and other common formats. For async workflows, Gladia returns a job ID, and a second HTTP module polls the results endpoint or receives the completed transcript via a webhook callback URL.

3. Analyze interview themes & sentiment

Once Make.com receives Gladia's completed JSON response, a JSON > Parse JSON module extracts the structured data. Map these key fields at this stage:

  • result.transcription.utterances for speaker-attributed quotes
  • result.audio_intelligence.sentiment_analysis for utterance-level sentiment
  • result.audio_intelligence.named_entities for extracted entities
  • result.audio_intelligence.summarization for the session summary

An Iterator module loops through the utterances array, processing each customer utterance (where speaker = 1) as a separate Airtable record. This makes each quote individually searchable rather than buried in a long-form transcript field.

4. Build interview library in Airtable

The final module uses Make.com's Airtable modules to write each parsed utterance as a new record. The Airtable > Create a Record action handles the initial write for each utterance. Map the fields from Gladia's JSON to Airtable as follows:

Gladia JSON field Suggested Airtable field Suggested field type
utterance.text Quote text Long text
utterance.speaker Speaker label Single select
utterance.start Timestamp (seconds) Number
utterance.sentiment Sentiment Single select
Named entities Extracted entities Long text
Session summary Interview summary Long text
Interview date (from trigger) Interview date Date

Configuring Airtable for interview insights

A flat table of quotes is searchable but not navigable. A linked-record architecture makes it possible to move from a quote to the interview it came from, to the customer segment it represents, to every other quote from that segment in one click.

A suggested two-table structure:

Table 1: Interviews (suggested schema)

Field Suggested type
Session ID Auto number
Interview date Date
Customer segment Single select
Recording URL URL
Interviewer User
Interview summary Long text

Table 2: Quotes (suggested schema)

Field Suggested type
Quote text Long text
Speaker Single select
Sentiment Single select
Themes Multiple select
Timestamp (seconds) Number
Named entities Long text
Interview Link to Interviews

Linking records in Airtable connects each quote back to its parent interview. With that relationship in place, a product manager can pull every quote tagged "pricing friction" from enterprise customers in the past 90 days, find all negative-sentiment utterances from a specific segment in a single filter, or count how many times a specific competitor appeared in named entity extractions over 90 days. That filtering precision depends entirely on quote attribution being correct, which is why the diarization accuracy layer is the reliability foundation for everything else in this architecture.

Find key insights from customer interviews

Once the pipeline runs, your library becomes a research asset the entire product organization can query, not just the researcher who conducted the interview. Three workflows cover most product research needs:

  • Segmented quote extraction: Filter Quotes where Segment = "Enterprise" and Theme = "onboarding" and Sentiment = "Negative" to surface every relevant complaint from that cohort without reading a raw transcript. The filtering precision depends directly on Gladia's named entity and sentiment outputs, both included at no additional cost on Starter and Growth plans.
  • Time-series theme tracking: Group quotes by theme and sort by interview date to see whether a concern is growing, shrinking, or stable across research cycles. This is particularly useful for tracking sentiment changes after a product release or pricing update.
  • Roadmap evidence queries: When a VP of Engineering asks why onboarding is on the Q3 roadmap, pull the filtered view in 30 seconds and show tagged quotes from enterprise customers, sorted by sentiment and date, extracted across multiple interviews over the past quarter. That's evidence, not instinct.

Preventing cost surprises as you scale

Per-hour pricing based on audio duration is how Gladia structures its plans, and diarization, sentiment analysis, NER, and summarization are all included in the base rate on Starter and Growth plans with no add-on fees at any volume tier.

Compare that to the add-on model used by alternatives. According to publicly available pricing from external sources, AssemblyAI's entity detection ($0.08/hr), topic detection ($0.15/hr), summarization ($0.03/hr), and speaker diarization ($0.02/hr) stack separately on top of the base Universal-2 rate of $0.15/hr, bringing the total to $0.43/hr with those intelligence features enabled, a 187% increase before you've processed a single interview. Deepgram structures speaker diarization as a separate add-on billed above the base transcription rate.

The table below models Gladia's all-inclusive pricing against a typical add-on model at three research volumes, using rates from Gladia's pricing page.

Monthly volume Gladia Starter (all-in, $0.61/hr) Typical add-on model (~$0.43/hr) Gladia Growth (as low as $0.20/hr, with commitment)
100 hours/month $61 $43 From $20
1,000 hours/month $610 $430 From $200
10,000 hours/month $6,100 $4,300 From $2,000

Add-on model calculated from AssemblyAI's public pricing page: $0.15/hr base (Universal-2) + $0.08/hr entity detection + $0.15/hr topic detection + $0.03/hr summarization + $0.02/hr speaker diarization = $0.43/hr with intelligence features enabled. The Growth plan requires an upfront volume commitment. Figures based on public pricing pages.

On Growth and Enterprise plans, customer data is never used for model training and no opt-out is required. For product teams processing interview audio containing unreleased product details or sensitive customer feedback, that's a material compliance detail.

Automating interview libraries: key questions

Does Gladia handle real-world interview audio conditions?

Gladia's diarization is tested on real-world audio including noisy recordings, overlapping speech, and calls with variable microphone quality. Gladia's async benchmark covers 74+ hours of diverse audio across 7 datasets with conversational speech that reflects actual interview conditions. You don't need studio-quality recordings to get reliable speaker attribution from this pipeline.

How fast does async transcription process an interview recording?

Gladia processes approximately one hour of audio in under 60 seconds on the async API, per the AI note-taker architecture guide. A 45-minute customer interview returns a complete diarized transcript in under a minute after the recording lands in cloud storage, so your Airtable record is ready before the post-interview debrief starts.

How do you backfill historical interview recordings?

Build a batch scenario in Make.com using a Google Sheet as the trigger source. Add one row per recording URL, use Google Sheets > Watch Rows as the trigger, and run the same Gladia + Airtable modules on each row sequentially. This decouples the backfill job from the live pipeline without duplicating scenario logic, letting you process years of existing recordings without manual re-transcription.

Can the pipeline handle interviews with non-English speakers?

Gladia's supported languages cover 100+ languages, including 42 that no other API-level speech-to-text provider supports. Code-switching detection handles mid-conversation language changes automatically. When an interview starts in Spanish and shifts to English, Gladia maintains speaker attribution and language tagging across the transition.

Start with 10 free hours. Most integrations are live in under a day. Run diarization and multilingual transcription on your own interview recordings to verify speaker attribution and language handling before committing to a plan.

FAQs

What file size and duration limits apply to Gladia's async API?

The async API accepts audio files up to 1000MB and 135 minutes per request, per our format specifications.

Does Gladia include diarization in the base price or charge it as an add-on?

Diarization is included in the base rate on Starter ($0.61/hr async) plans with no add-on fee. It's available in async workflows, powered by pyannoteAI's Precision-2 model.

How many languages does Gladia support for customer interview transcription?

Solaria-1 supports 100+ languages, including 42 not available from any other API-level STT provider, with native mid-conversation code-switching detection across all supported languages.

Does customer audio get used to retrain Gladia's models?

On Growth and Enterprise plans, customer data is never used for model training and no opt-out is required. On the Starter plan, data can be used for training by default.

Key terms glossary

Diarization: The process of segmenting an audio recording by speaker identity, assigning a label (Speaker 0, Speaker 1, etc.) to each utterance. Gladia's async diarization is powered by pyannoteAI's Precision-2 model.

Word error rate (WER): A standard metric for transcription accuracy, typically calculated as the percentage of words in a transcript that differ from the ground-truth reference.

Diarization error rate (DER): A standard metric for speaker attribution accuracy, typically measuring the percentage of audio time incorrectly assigned to a speaker. Our async diarization achieves on average 3x lower DER than alternatives.

Code-switching: When a speaker changes language mid-conversation, for example, shifting from English to Spanish within a single turn. Gladia handles this automatically across 100+ languages without requiring language pre-specification in the API call.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more