The bottleneck in automated follow-up emails is not Claude. It's the transcript you feed it. A wrong name, a dropped action item, or a misattributed speaker creates an email that embarrasses your team or, at scale, silently corrupts your CRM. The fix is not a better prompt. It's a better audio foundation.
This guide walks through building a production-ready pipeline that takes a meeting recording, runs it through Gladia's async transcription API for accurate, diarized output, passes that structured JSON into Claude for personalized email generation, and delivers the result via the Gmail API or Lemlist. Teams report going from first API call to production in under a day.
Why automate follow-up emails from meeting recordings?
Sales teams spend a meaningful share of each week on post-meeting documentation and follow-up drafting, and manual follow-up is inconsistent: one rep sends three sentences, another sends a detailed action item list, and neither captures the nuance of a multilingual call where half the conversation happened in Spanish or French. When follow-up emails are generated from accurate, diarized transcripts rather than personal memory, every customer interaction receives the same quality of documentation, brand voice enforcement through your Claude system prompt, and a searchable record that syncs to your CRM.
Tools like Otter.ai and Read.ai offer transcription and summary features, but they carry tradeoffs that compound at scale. Both have historically priced on seats rather than usage, though pricing structures may have changed, so check current pricing pages before modelling costs. Either way, the cost structure penalizes growth rather than rewarding it.
On data privacy, Otter.ai's terms permit using customer content for testing, tuning, and improving its ML models. Manual review of specific recordings requires explicit customer permission. Read.ai's privacy policy states it uses data to train and improve its models for service purposes by default. It offers an opt-in Customer Experience Program for additional data sharing, and restricts using Google Workspace data for general AI model training. Building on Gladia's Growth or Enterprise plan eliminates this ambiguity entirely: on those tiers, customer audio is never used for model training and no opt-out action is required.
Automated follow-up system architecture
Before writing a line of code, the architectural decision matters. Off-the-shelf meeting assistants give you a fixed pipeline with per-seat pricing and opaque data handling. A custom pipeline built on Gladia and Claude gives you predictable per-hour billing with all intelligence features bundled, full control over prompt design and email format, and data governance that matches your compliance requirements.
Follow-up email pipeline design
The pipeline operates in discrete stages after a meeting ends:
- Audio input: A meeting recording (WAV, M4A, FLAC, or AAC, up to 135 minutes and 1,000MB) is submitted to the Gladia async transcription API.
- Gladia processing: Solaria-1 transcribes the audio with word-level timestamps, speaker diarization powered by pyannoteAI's Precision-2 model, and automatic language detection across 100+ languages.
- JSON output: Gladia returns a structured JSON object with per-utterance speaker labels, timestamps, and transcript text.
- Claude prompt: The formatted transcript is passed to Claude with a system prompt that instructs it to extract action items, identify key decisions, and draft a personalized follow-up email.
- Email delivery: The generated email is sent via the Gmail API or a Lemlist sequence.
AI-powered audio to email drafts
Both APIs use standard REST patterns. Gladia's async transcription API accepts POST requests with your key in the x-gladia-key header, and Claude uses the Anthropic messages endpoint. Integration patterns for this handoff are documented in the Gladia audio-to-LLM documentation, with broader meeting assistant architecture decisions covered separately.
Step 1: Get accurate meeting transcripts fast
Setting up your Gladia API key
Create an account via the Gladia getting started guide to access 10 free hours per month on the Starter plan. Your API key is available in the dashboard immediately after signup. Store it as an environment variable:
import os
GLADIA_API_KEY = os.environ.get("GLADIA_API_KEY")
Gladia audio upload workflow
The async API accepts either a URL pointing to your audio file or a direct file upload. Passing a URL from your storage provider is the cleanest pattern for most meeting recording pipelines:
import requests
def submit_transcription(audio_url: str, api_key: str) -> dict:
endpoint = "https://api.gladia.io/v2/pre-recorded"
headers = {
"x-gladia-key": api_key,
"Content-Type": "application/json"
}
payload = {
"audio_url": audio_url,
"diarization": True,
"language_config": {
"code_switching": True
}
}
try:
response = requests.post(endpoint, headers=headers, json=payload, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Transcription request failed: {e}")
raise
result = submit_transcription("https://your-storage.com/meeting.mp4", GLADIA_API_KEY)
# Returns: {"id": "45463597-...", "result_url": "https://api.gladia.io/v2/pre-recorded/45463597-..."}
Poll the result_url until the status returns done, then retrieve the full transcript JSON. Gladia processes audio at approximately 60x real-time speed, so a 60-minute meeting transcribes in about one minute.
Speaker diarization for meeting notes
Diarization in Gladia is powered by pyannoteAI's Precision-2 model, a production-grade speaker segmentation engine trained on diverse multi-speaker audio.
Async-only feature: Diarization is enabled by setting "diarization": true in your payload, as shown above. It segments audio into per-speaker turns before Solaria-1 transcribes each segment, producing an utterance-level JSON response that attributes every sentence to a specific speaker. This integration delivers on average 3x lower DER than alternative vendors, which translates directly to more reliable action item attribution in your follow-up emails.
Transcribing multilingual calls
Setting "code_switching": true tells Solaria-1 to detect mid-conversation language changes automatically, which matters on any call involving bilingual speakers. Gladia covers 100+ supported languages, including 42 that no other API-level STT competitor supports, such as Tagalog, Bengali, Punjabi, and Marathi. Solaria-1 achieves on average 29% lower WER than alternatives on conversational speech, based on a published methodology covering 8 providers across 7 datasets, a margin that compounds for teams handling multilingual European or Southeast Asian language calls.
Step 2: Structure transcripts for Claude processing
Accurate diarization and timestamps for Claude
The raw Gladia JSON response contains an array of utterances, each with a speaker label (e.g., "Speaker 1"), start and end timestamps in seconds, the transcribed text, and the detected language. A minimal parsing function converts this into a format Claude can reason about cleanly:
def format_transcript_for_llm(gladia_response: dict) -> str:
utterances = gladia_response["result"]["transcription"]["utterances"]
lines = []
for utt in utterances:
speaker = utt.get("speaker", "Unknown")
text = utt.get("transcription"transcript", "")
lines.append(f"Speaker {speaker}: {text}")
return "\n".join(lines)
This produces output in the form:
Speaker 1: Here is the summary from last quarter.
Speaker 2: Revenue increased by 12 percent.
Speaker 1: That is better than expected. I will set up a follow-on call.
AI for meeting action items
Because each line is attributed to a specific speaker, Claude can assign action items to the correct owner without guessing. This is where diarization accuracy directly determines email quality. Gladia's async diarization achieves on average 3x lower DER than alternative vendors, meaning fewer mis-attributed turns and fewer emails that say "you agreed to send the contract" when the customer actually said it.
Tailoring transcripts for AI personalization
Before passing the transcript to Claude, append a short metadata block that includes the meeting date, attendee names mapped to speaker labels, and the meeting type (discovery, demo, QBR). This metadata lets Claude address each attendee by name rather than "Speaker 1" in the generated email. If you use a scheduling tool, pull attendee names programmatically and build this mapping automatically, following the enrichment patterns in the AI note-taker architecture guide.
If your scheduling tool (Calendly, Zoom, Google Calendar) provides participant information via API or webhook, map each participant's name to the corresponding speaker label based on join order or voice characteristics detected during diarization. For manual workflows, review the first few minutes of the transcript to identify each speaker, then create a simple dictionary mapping speaker labels to attendee names:
speaker_map = {
"1": "Sarah Chen",
"2": "Michael Rodriguez",
"3": "Alex Thompson"
}
Include this mapping in your metadata block so Claude can use real names instead of generic speaker labels in the generated email.
Step 3: Craft personalized outreach with AI
Prompt patterns for follow-up emails
Claude performs best on structured meeting transcripts when you use XML tags to separate the transcript from the instructions. Based on Anthropic's prompt engineering guidelines, wrapping inputs in named tags reduces prompt injection risk and gives the model clear parsing boundaries:
def build_claude_prompt(transcript: str, attendees: list, meeting_type: str) -> tuple:
attendee_list = "\n".join(attendees)
system_prompt = (
"You are an expert executive assistant who writes professional, "
"concise follow-up emails based on meeting transcripts. "
"Ground every statement in the transcript. Do not add information "
"that is not present in the transcript."
)
user_prompt = f"""
<transcript>
{transcript}
</transcript>
<attendees>
{attendee_list}
</attendees>
<meeting_type>
{meeting_type}
</meeting_type>
<instructions>
Write a professional follow-up email that includes:
1. A 2-3 sentence executive summary of the meeting.
2. Key discussion points as 3-5 bullet items.
3. Action items with owner name and agreed deadline where stated.
4. A clear next step with a proposed date if one was mentioned.
Keep the tone professional and direct. Use attendee names, not speaker labels.
</instructions>
"""
return system_prompt, user_prompt
Use the system prompt to enforce your brand voice explicitly. If your team writes in a specific style, such as consultative and concise or friendly and direct, describe it in the system role with one or two examples. Claude follows stylistic constraints more reliably when they appear in the system role rather than buried in the user message. For multi-stakeholder meetings, generate a separate email for each attendee by passing the same transcript with a different <recipient> tag and instructing Claude to frame action items from that recipient's perspective, a pattern that sales teams use to drive higher response rates on post-meeting outreach.
Mitigating data gaps for email accuracy
To check confidence scores in the Gladia response, iterate through the word-level array and flag any segment where the average confidence drops below 0.7:
def annotate_low_confidence(gladia_response: dict, threshold: float = 0.7) -> str:
utterances = gladia_response["result"]["transcription"]["utterances"]
lines = []
for utt in utterances:
words = utt.get("words", [])
if words:
avg_confidence = sum(w.get("confidence", 1.0) for w in words) / len(words)
prefix = "[LOW CONFIDENCE] " if avg_confidence < threshold else ""
else:
prefix = ""
speaker = utt.get("speaker", "Unknown")
text = utt.get("transcription"transcript", "")
lines.append(f"{prefix}Speaker {speaker}: {text}")
return "\n".join(lines)
Then instruct Claude in the system prompt to flag any action item derived from a low-confidence segment as needing human verification before sending.
Step 4: Send emails via Gmail API or Lemlist
Initializing Gmail API for follow-up emails
The Gmail API uses OAuth2 authentication with the Gmail send scope. Download your credentials.json from the Google Cloud Console, install google-auth-oauthlib and google-auth-httplib2, and authenticate on first run to generate a token.json. Subsequent runs use the stored token without user interaction:
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build
import base64
from email.mime.text import MIMEText
SCOPES = ["https://www.googleapis.com/auth/gmail.send"]
def send_via_gmail(to: str, subject: str, body: str):
try:
creds = Credentials.from_authorized_user_file("token.json", SCOPES)
service = build("gmail", "v1", credentials=creds)
message = MIMEText(body)
message["to"] = to
message["subject"] = subject
raw = base64.urlsafe_b64encode(message.as_bytes()).decode()
service.users().messages().send(userId="me", body={"raw": raw}).execute()
except Exception as e:
print(f"Email send failed: {e}")
raise
Setting up Lemlist for outreach
For teams running outbound sequences rather than transactional emails, Lemlist accepts email content via its API and slots it into an existing campaign template. Use a webhook triggered by the Claude output step to POST the generated email body to your Lemlist sequence. If your team prefers a no-code orchestration layer, Gladia's Zapier integration connects Gladia's output to downstream tools without requiring direct API calls.
Optimize automated follow-up timing
For best results, send follow-up emails within 24-48 hours of the meeting ending, while context is still fresh for both parties. Schedule the pipeline to trigger automatically when a new recording lands in your storage bucket using a cloud function or a cron job polling for new files.
Pre-deployment email delivery testing
Before pushing to production, run the pipeline against a representative sample of past meeting recordings you can validate manually. Confirm that:
- Speaker labels map correctly to attendee names
- Action items reflect what was actually said in the transcript
- No email references a
[LOW CONFIDENCE] segment without flagging it for review - Generated emails pass deliverability checks using tools like MailTester or SpamAssassin before going to external addresses
Safeguard email quality and accuracy
Optimizing transcription costs at scale
The table below models Gladia's per-hour pricing against per-seat AI note-taker pricing at three usage levels, with diarization, translation, and NER features required.
| Monthly audio volume |
Gladia Starter ($0.61/hr) |
Gladia Growth (volume-based pricing) |
Per-seat tool (illustrative comparison) |
| 100 hours |
$61 |
from $20 |
varies by provider |
| 1,000 hours |
$610 |
from $200 |
varies by provider |
| 10,000 hours |
$6,100 |
from $2,000 |
varies by provider |
Gladia's per-hour pricing scales directly with actual audio processed, with diarization, translation, sentiment analysis, and NER all included in the base rate on Starter and Growth plans and no add-on fees on either tier. Growth plan pricing decreases with volume commitments, providing predictable unit economics as usage scales.
Measuring AI transcript accuracy
The primary metric to track is word error rate on your own meeting recordings. Run a sample of past calls through the pipeline and compare the transcript against a human-reviewed reference. Pay particular attention to named entities: product names, company names, and numerical figures, since errors in these fields flow directly into your CRM.
Recovering from automation failures
Build a dead-letter queue for transcription jobs that fail or return below a confidence threshold and route those to a human reviewer rather than sending a low-quality email automatically. Gladia maintains 99.9%+ uptime so full pipeline failures are rare, but partial accuracy issues on heavily noisy audio are a legitimate edge case worth planning for.
Data residency and compliance for calls
On Gladia's Growth and Enterprise plans, customer audio is never used to retrain models and no opt-out is required. On the Starter plan, data can be used for model training by default. If you're processing calls covered by GDPR, HIPAA, or SOC 2 Type II requirements, use Growth or Enterprise and review the Gladia compliance hub for documentation on data residency options, SOC 2 compliance, and on-premises deployment options.
Start testing the pipeline with 10 free hours on Gladia and build your first automated follow-up workflow.
FAQs
How accurate is Gladia on noisy meeting audio?
Solaria-1 is benchmarked against 8 providers across 7 datasets and 74+ hours of audio, achieving on average 29% lower WER than competing APIs on conversational speech.
What's the total latency from meeting end to email send?
Gladia processes approximately 60 seconds of audio per hour of content, so a 60-minute meeting transcribes in about one minute. The Anthropic API response time varies by model and load, and adding Gmail API send time means most pipelines complete within a few minutes of meeting end, though exact latency depends on your infrastructure and Claude model selection.
Can I customize email templates per meeting type?
Yes. Pass a <meeting_type> tag in your Claude prompt (such as "discovery call", "demo", or "QBR") and maintain a separate system prompt template for each type. Store these in a dictionary keyed by meeting type and load the correct one at runtime.
What does this pipeline cost at 100 meetings per month?
At an average of 60 minutes per meeting, 100 meetings equal 100 hours of audio. On Gladia's Starter plan that costs $61 per month and on the Growth plan as low as $20 per month, per the Gladia pricing page. Claude API costs depend on token volume and which model you use. Refer to Anthropic's published pricing for current rates based on your expected token consumption per meeting.
Key terms glossary
Diarization: The process of segmenting an audio recording by speaker identity to answer "who spoke when." In Gladia's async pipeline, diarization is powered by pyannoteAI's Precision-2 model and produces speaker labels for each utterance, enabling accurate action item attribution in follow-up emails. It is available in async mode only.
Code-switching: In speech recognition, code-switching refers to the use of more than one language within a single conversation, where a speaker shifts between languages mid-sentence or mid-turn. Gladia's Solaria-1 detects these shifts automatically across 100+ languages, which is critical for accurately transcribing multilingual sales calls.
Word error rate (WER): A standard metric for measuring transcription accuracy, calculated as the percentage of words in a transcript that differ from the correct reference text through substitutions, deletions, and insertions. Lower WER means more accurate transcripts and more reliable downstream email content.
Async transcription: A workflow where audio is submitted to an API and results are retrieved after processing completes, rather than streamed in real time. Async mode enables full-context accuracy, speaker diarization, and multilingual consistency, and Gladia processes approximately 60 seconds per hour of audio content in this mode.