Skip to main content
Rime provides industry-leading text-to-speech (TTS) AI models built for real-time conversational experiences at scale. Our latest flagship model, Arcana v3, delivers authentic, natural, and expressive speech with the speed and reliability required for production voice AI, whether you’re building intelligent IVRs, multilingual voice agents, or anything in between.

Rime’s TTS Models

Rime now offers a suite of models tailored for different production needs:
  • Arcana v3 — our flagship TTS model that combines ultra-realistic, expressive voices with low latency (~120 ms TTFB) and native multilingual code-switching across more than 10 languages. It includes word-level timestamps and enterprise-grade ergonomics for high-volume, real-time deployments. It also supports natural elements like laughter and breathing.
  • Mistv2 — optimized for speed and fine-grained control, giving you accurate pronunciation and high concurrency for use cases that demand quick synthesis and customization.

What Makes Arcana v3 Different

  • Real-Time Conversational Performance: Arcana v3 delivers TTS with industry-leading latency (sub 120ms on-prem latency and 200ms via the cloud API), enabling natural back-and-forth interactions without awkward pauses. This is fast enough for mid-utterance control and barge-in with no awkward silences.
  • Multilingual & Code-Switching: A single model supports more than 10 languages (English, Spanish, Hindi, Arabic, French, Portuguese, German, Japanese, Hebrew, and Tamil) and can switch between them mid-utterance without losing prosody or voice identity.
  • Word-Level Timestamps: Structural metadata enables text-audio alignment, real-time highlighting, better interruption handling, and smarter orchestration in voice applications.
  • Enterprise-Grade Deployment: Arcana v3 scales with high concurrency per machine, ORCA headers for seamless auto-scaling, and a robust suite of TTS-specific observability metrics.

Language & Voice Support

Arcana v3 supports a broad set of languages by design, enabling global voice experiences whether you’re building for English, Spanish, Hindi, French, German, Arabic, Japanese, Hebrew, Portuguese, Tamil, or more — and switching fluidly between them. Rime exposes a rich set of demographically diverse voices you can select via API to match your brand, audience, and use case.

Flexible Deployment

Rime supports flexible infrastructure options — from cloud API and virtual private cloud to on-premises deployments — without artificial concurrency limits. Whether your application must run close to users for real-time responsiveness or within secure enterprise environments, Rime fits your architecture.

Ready to Get Started?

Follow the Python quickstart to begin generating text-to-speech with Rime’s models — including Arcana v3 — in under five minutes.