Get started

How Callable works

This page explains what happens under the hood when a Callable agent takes a call or a chat, and where you configure each piece.

The pipeline in plain terms

Every Callable agent runs the same loop: listen, think, speak. On a voice call, that means turning speech into text, generating a reply, and turning the reply back into speech.

Here's the loop, one turn at a time:

  1. Speech-to-text (Deepgram) — the caller's audio is transcribed into text in real time.
  2. The LLM (your chosen model) — the transcript goes to the agent's language model, which generates a reply grounded in the agent's system prompt, business info, knowledge base, tools, and conversational flow.
  3. Text-to-speech (Cartesia or ElevenLabs) — the reply text is synthesized into audio and streamed back to the caller.

Chat agents run the same LLM pipeline over text — there's no STT or TTS step, since the input and output are already text.

Streaming and barge-in

Callable doesn't wait for the whole reply before it starts talking. Replies stream sentence-by-sentence, so the caller hears the first sentence while later ones are still being generated — that's how the agent keeps latency to roughly one second.

The caller can also barge-in: talk over the agent mid-sentence, and the agent stops to listen. This keeps calls feeling like a real back-and-forth conversation rather than a rigid prompt-and-response.

Grounding: where the agent's answers come from

The LLM doesn't answer from thin air. Each reply is grounded in four things you control:

  • The system prompt + business info — the agent's identity, plus your business name, industry, hours, services, FAQs, and greeting.
  • The knowledge base — website URLs, uploaded files, and pasted text the agent draws on for its answers.
  • Tools — functions the agent can call mid-conversation (transfer-to-human, send SMS, end-call, custom tools, and MCP servers).
  • The conversational flow — a node-graph state machine that scripts structured conversations one step per turn.
This same grounding applies everywhere the agent runs: on the phone, on the web chat widget, and in the dashboard.

Inbound vs. outbound

The pipeline is identical for inbound and outbound — the difference is who starts the call and how the call is set up.

Inbound (someone calls you)

An inbound call arrives on one of your phone numbers and routes to the agent (or agent team) attached to that number. The agent opens with its greeting and runs the loop above for the rest of the call.

Outbound (Callable calls them)

Outbound calls are driven by campaigns. You create a campaign, pick an agent, choose a caller-ID (one of your numbers to dial from), and add contacts by CSV or manually. You set concurrency and per-minute pacing, then schedule it or start it now. The dialer places each call, and once it connects, the agent handles the conversation with the same STT → LLM → TTS loop.

Per-contact status moves through: queuedcallingconnectedcompleted / no_answer / busy / failed.

Where each piece is configured

Each stage of the pipeline maps to a tab in the agent builder or a section of the dashboard.

| Piece | Where you configure it | | --- | --- | | Speech-to-text (Deepgram) | Transcriber tab — model (nova-2-phonecall for phone), endpointing, keyterm boosting, denoising, confidence, language | | The LLM / model | Chosen on the Agent tab (a latency/cost preset) and fine-tuned under the Settings tab; connect providers and keys in Integrations | | Text-to-speech | Voice tab — Cartesia or ElevenLabs, voice, speed (0.5–2.0), language, fallback voices | | Identity, business info, system prompt, greeting | Agent tab | | Grounding content | Knowledge Base tab (and the workspace-level Knowledge Base) | | Tools | Tools tab on the agent; create and test them in the Tools section | | Structured flow | Conversational Flow tab | | Phone numbers | Phone numbers page — buy on Telnyx, or bring your own Twilio / SignalWire; then attach an agent or team | | Outbound calling | Campaigns |

Model precedence: a per-agent model override wins; otherwise the owner's default model is used. Provider dropdowns only show providers that are actually available or connected.

Bring your own LLM keys (Anthropic, OpenAI, Google Gemini, Groq, Mistral and more), TTS providers, and STT providers in Integrations — you pick the model when you connect.

Next steps