Build

Transcriber (speech-to-text)

The Transcriber tab controls how your voice agent turns caller audio into text using Deepgram — including which model runs, how quickly the agent decides you've finished talking, and how it handles noise, jargon, and language.

Speech-to-text (STT) is the first step in every voice call: the caller's audio is transcribed by Deepgram, and that text is what the agent's LLM actually reads before it replies. Getting these settings right makes your agent feel faster and more accurate on the phone. These settings apply to voice agents; chat agents run over text and don't use a transcriber.

Where to find it

Open your voice agent in the agent builder and select the Transcriber tab. All of the settings below live there.

Model

Choose the Deepgram model your agent uses to transcribe calls.

The default is Flux General English (flux-general-en). Other selectable Deepgram models include Nova-3, Nova-2 phone (nova-2-phonecall, tuned for phone audio), Nova-2, and Enhanced.

Tip: For phone calls, Nova-2 phone is optimized for the compressed, narrowband audio of a real call — a good choice if you want a phone-tuned model.

Turn detection (end of turn)

A few controls decide when the agent thinks the caller has finished talking and should reply:

End of Turn Timeout — how long to wait after speech stops before the agent responds, from 500ms to 10s (default 5s). Lower is snappier; higher gives callers more room to pause mid-sentence.
End of Turn Confidence — how confident the model must be that the turn actually ended (0.5–0.9).
Smart Endpointing — the turn-detection strategy: Off, Built-in, or LiveKit.

Tip: If the agent keeps interrupting callers who pause to think, raise the End of Turn Timeout; if replies feel slow, lower it.

Keyterm boosting

Keyterm boosting tells the transcriber which words to expect, improving recognition of terms Deepgram might otherwise mishear. Add the vocabulary specific to your business — product names, industry jargon, and proper nouns — so they're transcribed correctly on the call.

Background denoising

Enable background denoising to filter ambient noise out of the caller's audio before transcription. This helps keep transcripts clean when callers are in noisy environments.

Confidence threshold

The confidence threshold sets the minimum confidence Deepgram must have in a transcription for it to be accepted. Tuning this trades off between accepting uncertain transcriptions and discarding low-confidence audio.

Language

Set the language the transcriber should expect for the caller's speech. Match this to the language your callers speak so audio is transcribed accurately.

Next steps

Voice (text-to-speech) — pick the voice, speed, and language your agent speaks back with.
Agents — configure your agent's identity, business info, and system prompt.
Knowledge Base — ground your agent's answers in your own content.
Integrations — connect other STT providers such as AssemblyAI and Whisper.

← Voice (text-to-speech)

Knowledge Base →