| Rime TTS | Default TTS |
|---|---|
Prerequisites
To follow this guide, you need:- OpenClaw
- FFmpeg
- Python 3
- A Rime API key
- A Telegram account
Step 1: Create a Telegram bot and connect it to OpenClaw
This guide uses Telegram as the primary interface with OpenClaw, but you could easily adapt it to use your preferred messaging service. First, create a new bot using Telegram’s BotFather:- Open Telegram and search for @BotFather.
- Send
/newbotand follow the prompts to choose a name and username. - BotFather replies with your bot token, which looks like
123456789:ABCdefGHIjklMNOpqrsTUVwxyz.
~/.openclaw/.env:
~/.openclaw/openclaw.json file:
access not configured message with an access code. Copy the access code and run the following command in your terminal to pair the bot with OpenClaw:
Step 2: Add your Rime API key
OpenClaw reads environment variables from the~/.openclaw/.env file. Add your Rime API key to it:
Step 3: Disable OpenClaw’s built-in TTS
OpenClaw has a built-in TTS system that the assistant uses by default. We need to disable the built-in TTS so that OpenClaw instead uses the new Rime skill we are adding. Update youropenclaw.json file as follows:
- Turns off auto-TTS so the built-in pipeline doesn’t generate audio automatically
- Disables Edge TTS so it can’t be used as a fallback
- Denies the built-in
ttstool so the LLM can’t call it directly
Step 4: Install the rime-reader skill
Therime-reader skill reads documents aloud in three modes:
- In verbatim mode, it reads the document aloud, word for word, in your chosen voice.
- In summary mode, it summarizes the document’s content in your chosen voice.
- In podcast mode, two AI hosts, each with a different voice, summarize and discuss the content.
rime-reader skill by cloning it from the following repository into your ~/.openclaw/skills/ directory:
rime.py) that handles all three modes and a SKILL.md that teaches the LLM how to use it.
How rime.py works
The script has three modes, driven by the following command-line arguments:- A file path for document reading
--textfor a single utterance--segmentsfor podcast
Chunking
In verbatim and summary mode,rime.py uses chunking to break long text into sentence-aligned chunks of roughly 400 characters each. This ensures that no single API call is too large.
Synthesis
The script then sends the chunks to the Rime API, which synthesizes them and returns raw audio bytes.Stitching
The script concatenates the bytes from each chunk and generates silences between them, stitching them all into a singlebytearray.
You can specify a voice for each segment in podcast mode:
Encoding
Then,rime.py encodes the bytearray by making an ffmpeg call that converts the raw audio buffer to OGG Opus, the format that Telegram expects:
.ogg path to stdout. The LLM reads this and uses it in a MEDIA: directive with [[audio_as_voice]] to deliver it as a Telegram voice note bubble.
Step 5: Register the skill and configure the personality
Enable the skill in~/.openclaw/openclaw.json:
Personality (SOUL.md)
The~/.openclaw/workspace/SOUL.md file configures OpenClaw’s agent personality. The LLM reads the file at the start of every session.
Add the following Document Reading section below to your SOUL.md file. Without it, the bot skips the rime-reader skill and generates audio using whichever TTS model it finds first. Since we’ve disabled the default TTS model, it would fail to generate any audio and fall back to replying in text.
Step 6: Test the flow
Restart the gateway, so that you can test the document reading flow:/new to your bot.
Then send a document or paste text in the chat and ask the bot to read it.
The bot should ask you to choose from the three delivery modes: verbatim, summary, or podcast.
Choose the verbatim mode.
Next, it should prompt you to pick a voice.
Once you’ve selected a voice, you should receive a voice note of your text.
Tuning
The bot’s behavior is driven bySOUL.md, which means you can reshape it. Just edit the file or tell the bot, directly in your Telegram chat, to update it for you.
Consider how you can tweak various aspects of your OpenClaw assistant:
Voice
You can select a default voice for your OpenClaw assistant by editingSOUL.md or sending a Telegram message telling the bot to, “Use Transom next time.”
You can use any of the available Arcana voices: atrium, lyra, transom, parapet, fern, thalassa, truss, sirius, eliphas, lintel, or one of the many others listed on Rime’s Voices page.
Podcast personality
The LLM writes the podcast script before synthesizing it, so you can steer the tone. Try adding a line such as the following to the Document Reading section of yourSOUL.md:
SOUL.md entirely and just tell the bot to, “Make the podcast hosts argue like an old married couple.” The LLM will adapt the script on the fly.
Skip the prompts
If you always want the same voice and delivery mode, you can hardcode them inSOUL.md to skip the bot prompts.
For example, you could replace the first two steps with the following instruction:
SOUL.md afresh every session, your changes take effect immediately after you send /new in Telegram.
