Rime provides industry-leading text-to-speech (TTS) AI models built for real-time conversational experiences at scale. Our latest flagship model, Coda, pairs a sophisticated LLM backbone with a dedicated speech inference engine. It’s trained on Rime’s massive proprietary data set of full-duplex conversational speech between real people — not voice actors, audiobook narrators, or YouTube influencers — which means it’s perfect for production voice AI agents, whether you’re building intelligent IVRs, multilingual voice agents, or anything in between.Documentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
Hello from Coda in 30 seconds
YOUR_API_KEY from the API Tokens page.
The Rime CLI wraps this same endpoint behind a rime tts command if you’d rather not write the request yourself.
What Makes Coda Different
Real-time conversational performance
Sub-100ms latency, fast enough for mid-utterance control and barge-in without awkward silences.
Top-rated voice quality
In human-led evaluations, Coda surpasses both prior Rime models and competitor TTS offerings on naturalness, prosody, and artifact-free output.
Multilingual support
One model speaks English, Spanish, French, Portuguese, German, and Japanese using a shared expressive voice lineup.
Word-level timestamps
Structural metadata enables text-audio alignment, real-time highlighting, better interruption handling, and smarter orchestration.
Rime’s TTS Models
Rime now offers a suite of models tailored for different production needs:- Coda
modelId: coda- Our flagship TTS model. LLM backbone with a dedicated speech inference engine, trained on conversational full-duplex data.
- Surpasses prior Rime models and competitor offerings in human-led voice-quality evaluations. See the announcement.
- Sub-100ms model latency on the GPU engine when self-hosted or running on-prem.
- Via the cloud API, expect roughly 25–50ms additional network round-trip from most of the continental US when you pick the closest regional endpoint.
- Native multilingual support across English, Spanish, French, Portuguese, German, and Japanese.
- Word-level timestamps for fine-grained text-audio alignment and interruption handling.
- Arcana v3
modelId: arcana- The previous-generation flagship: ultra-realistic, expressive voices with low latency (~120ms TTFB out of engine) and native multilingual code-switching across more than 10 languages.
- Coda is the recommended successor for all Arcana traffic.
- Arcana v2
modelId: arcanav2- Ultra-realistic and expressive voices (including laughter and whispering) with low latency (~250 ms TTFB out of the engine).
- Built for high-volume conversational applications.
- Mist v3
modelId: mistv3- Major update to the Mist engine — TTFA around 37 ms (P50) on the GPU engine, significantly faster than Coda or Arcana while preserving Mist’s pronunciation control and predictability.
- Mist v2
modelId: mistv2- Previous-generation Mist model. For new projects, prefer Mist v3.

