TTS in five minutes

This guide shows you how to generate your first audio clip with Rime’s text-to-speech (TTS) API and experiment with different voices and speech customizations.

Prerequisites

To follow this guide, you need:

A Rime API token: Create a free Rime account and copy your API key from the API Tokens page.
A language runtime:
- Python 3.10 or later, or
- Node.js 18 or later

Code blocks in this guide are tabbed. Pick Python or JavaScript in each block to follow your preferred language.

Create your script

If you’d rather paste a working file and read along, grab the full script below. Otherwise, continue step-by-step.

Full script

Create a file called rime_hello_world.py or rime_hello_world.js and paste the full script:

Full script (copy/paste)

import json
import urllib.request

RIME_API_KEY = "your_api_key_here"

headers = {
    "Accept": "audio/mp3",
    "Authorization": f"Bearer {RIME_API_KEY}",
    "Content-Type": "application/json"
}

payload = {
    "text": "Hello! This is Rime speaking.",
    "speaker": "celeste",
    "modelId": "arcana"
}

data = json.dumps(payload).encode("utf-8")

request = urllib.request.Request(
    "https://users.rime.ai/v1/rime-tts",
    data=data,
    headers=headers,
    method="POST"
)

with urllib.request.urlopen(request) as response:
    with open("output.mp3", "wb") as f:
        while chunk := response.read(4096):
            f.write(chunk)

print("Audio saved to output.mp3")

Step-by-step code

Create a file called rime_hello_world.py or rime_hello_world.js and import the required library modules:

import json
import urllib.request

Next, we’ll create a request to the Rime API. Set the request headers, specifying the Rime API key that you copied:

RIME_API_KEY = "your_api_key_here"

headers = {
    "Accept": "audio/mp3",
    "Authorization": f"Bearer {RIME_API_KEY}",
    "Content-Type": "application/json"
}

Configure a payload specifying the details of the request:

payload = {
    "text": "Hello! This is Rime speaking.",
    "speaker": "celeste",
    "modelId": "arcana"
}

This payload includes the three required parameters:

text adds the text that the model converts to speech.
speaker sets the voice that the agent uses (view your options on our Voices page).
modelId specifies which model the agent uses. Use arcana for the most realistic voices, or mistv2 for faster synthesis.

The Arcana API reference details the many optional parameters you can add to request payloads. Now that you’ve created the headers and payload, make a POST request to the Rime API and write the streamed audio response to a file:

data = json.dumps(payload).encode("utf-8")

request = urllib.request.Request(
    "https://users.rime.ai/v1/rime-tts",
    data=data,
    headers=headers,
    method="POST"
)

with urllib.request.urlopen(request) as response:
    with open("output.mp3", "wb") as f:
        while chunk := response.read(4096):
            f.write(chunk)

print("Audio saved to output.mp3")

Streaming allows audio playback to begin before generation completes. This enables conversational flow, as the user can start listening to responses before the entire audio clip has been generated. Although streaming is less important in this example, because we’re writing to a file, it’s vital when using models in real-time conversations. If this sounds interesting, follow the LiveKit quickstart to create your first conversational agent.

Test the script

Run your script from the terminal:

python rime_hello_world.py

On a successful run, the terminal displays a confirmation that your audio file has been saved:

'Audio saved to output.mp3'

Choose a voice

Rime offers a range of voices with different personalities. To change the voice, update the speaker parameter in your request:

payload = {
    "text": "Hello! This is Rime speaking.",
    "speaker": "orion",  # Try different voices here
    "modelId": "arcana"
}

Browse all available voices on the Voices page.

Custom pronunciation

The mistv2 model lets you specify the pronunciation of brand names or uncommon words using Rime’s phonetic alphabet. Add the custom pronunciation in curly brackets and set phonemizeBetweenBrackets to true:

payload = {
    "text": "Welcome to {r1Ym} labs.",
    "speaker": "peak",
    "modelId": "mistv2",
    "phonemizeBetweenBrackets": True
}

Use the Pronunciation tool on the dashboard to generate phonetic strings for any word.

Next steps

Now that you can generate TTS audio, try following the LiveKit quickstart guide to learn how you can set up a real-time conversation with an agent. Check out these resources to get more familiar with Rime:

Models

Compare Arcana (realistic) and Mistv2 (fast)

Voices

Browse all available voice options

Latency

Optimize for real-time performance

Arcana Streaming API

Stream audio with our most realistic model

Introduction

Getting started

Documentation

TTS in five minutes

Prerequisites

Create your script

Full script

Step-by-step code

Test the script

Choose a voice

Custom pronunciation

Next steps

Models

Voices

Latency

Arcana Streaming API

Introduction

Getting started

Documentation

​Prerequisites

​Create your script

​Full script

​Step-by-step code

​Test the script

​Choose a voice

​Custom pronunciation

​Next steps

Models

Voices

Latency

Arcana Streaming API

Prerequisites

Create your script

Full script

Step-by-step code

Test the script

Choose a voice

Custom pronunciation

Next steps