Coda JSON WebSocket (/ws3): structured events with base64 audio chunks and word-level timestamps.
Documentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
., ?, !. This is most pertinent for the initial messages
sent to the API, as synthesis won’t begin until there are sufficient
tokens to generate audio with natural prosody. After the first synthesis
of any given utterance, typically enough time has elapsed that subsequent
audio contains multiple clauses, and the buffering becomes largely invisible.
contextId: null,
and the audio for the second will be tagged with its UUID.
word_timestamps are the same length and index-aligned: for a given index i, words[i] is spoken from start[i] to end[i]. Times are in seconds, measured from the beginning of the audio for the current synthesis. If a context id was attached to the text that produced this audio, it is included on the event.
Example payload:
done event. This signals that the current synthesis is fully complete. If the client sends more text and triggers further synthesis, another done will follow.
done fires depends on the segment setting. See Segmentation & Behavior Settings for full details.
coda.coda else the websockets server will default to mistv2 for speech synthesis.mp3, mulaw, or pcm| 639-1 | 639-2/3 | Language |
|---|---|---|
en | eng | English |
es | spa | Spanish |
fr | fra | French |
pt | por | Portuguese |
de | deu | German |
ja | jpn | Japanese |
immediate=true in query params is equivalent to segment=immediate. If a null value is provided, it will default to “bySentence”.