Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rime.ai/llms.txt

Use this file to discover all available pages before exploring further.

Rime provides industry-leading text-to-speech (TTS) AI models built for real-time conversational experiences at scale. Our latest flagship model, Coda, pairs a sophisticated LLM backbone with a dedicated speech inference engine. It’s trained on Rime’s massive proprietary data set of full-duplex conversational speech between real people — not voice actors, audiobook narrators, or YouTube influencers — which means it’s perfect for production voice AI agents, whether you’re building intelligent IVRs, multilingual voice agents, or anything in between.

Hello from Coda in 30 seconds

curl --request POST \
  --url https://users.rime.ai/v1/rime-tts \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --header 'Accept: audio/mp3' \
  --output hello.mp3 \
  --data '{
    "speaker": "astra",
    "text": "Hello from Coda.",
    "modelId": "coda",
    "language": "en"
  }'
Get your YOUR_API_KEY from the API Tokens page. The Rime CLI wraps this same endpoint behind a rime tts command if you’d rather not write the request yourself.

What Makes Coda Different

Real-time conversational performance

Sub-100ms latency, fast enough for mid-utterance control and barge-in without awkward silences.

Top-rated voice quality

In human-led evaluations, Coda surpasses both prior Rime models and competitor TTS offerings on naturalness, prosody, and artifact-free output.

Multilingual support

One model speaks English, Spanish, French, Portuguese, German, and Japanese using a shared expressive voice lineup.

Word-level timestamps

Structural metadata enables text-audio alignment, real-time highlighting, better interruption handling, and smarter orchestration.

Rime’s TTS Models

Rime now offers a suite of models tailored for different production needs:
  • Coda
    • modelId: coda
    • Our flagship TTS model. LLM backbone with a dedicated speech inference engine, trained on conversational full-duplex data.
    • Surpasses prior Rime models and competitor offerings in human-led voice-quality evaluations. See the announcement.
    • Sub-100ms model latency on the GPU engine when self-hosted or running on-prem.
      • Via the cloud API, expect roughly 25–50ms additional network round-trip from most of the continental US when you pick the closest regional endpoint.
    • Native multilingual support across English, Spanish, French, Portuguese, German, and Japanese.
    • Word-level timestamps for fine-grained text-audio alignment and interruption handling.
  • Arcana v3
    • modelId: arcana
    • The previous-generation flagship: ultra-realistic, expressive voices with low latency (~120ms TTFB out of engine) and native multilingual code-switching across more than 10 languages.
    • Coda is the recommended successor for all Arcana traffic.
  • Arcana v2
    • modelId: arcanav2
    • Ultra-realistic and expressive voices (including laughter and whispering) with low latency (~250 ms TTFB out of the engine).
    • Built for high-volume conversational applications.
  • Mist v3
    • modelId: mistv3
    • Major update to the Mist engine — TTFA around 37 ms (P50) on the GPU engine, significantly faster than Coda or Arcana while preserving Mist’s pronunciation control and predictability.
  • Mist v2
    • modelId: mistv2
    • Previous-generation Mist model. For new projects, prefer Mist v3.
For full details on each model — including the latency benchmarks, voice counts, and feature matrix — see Models.

Migrating from Arcana

Coda is meaningfully better than Arcana across naturalness, prosody, and artifact-free output. We recommend migrating all existing Arcana traffic to Coda. The API contract is the same — just swap modelId: arcana for modelId: coda.

Language & Voice Support

Coda supports global voice experiences across English, Spanish, French, Portuguese, German, and Japanese, with a shared voice identity across languages. Rime exposes a rich set of demographically diverse voices you can select via API to match your brand, audience, and use case.

Flexible Deployment

Rime supports flexible infrastructure options — from the cloud API and virtual private cloud to on-premises deployments — without artificial concurrency limits. Whether your application must run close to users for real-time responsiveness or within secure enterprise environments, Rime fits your architecture.

Ready to Get Started?

Follow the quickstart guide to begin generating text-to-speech with Rime’s models — including Coda — in under five minutes.