Mist v2 API reference
Streaming HTTP
Mist v2 streaming HTTP endpoint.
The streaming endpoint returns audio bytes in the format specified by theDocumentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
Accept header.
Audio Formats
Set theAccept header to one of the following values:
| Format | Accept Header | Default Sampling Rate | Notes |
|---|---|---|---|
| MP3 | audio/mpeg | 22050 | |
| PCM | audio/L16 | 16000 | Headerless 16-bit little-endian linear PCM. |
| G.711 μ-law | audio/PCMU | 8000 | Headerless stream of audio bytes. |
Deprecated aliases
Still accepted for backwards compatibility; new code should use the RFC types above.| Deprecated | Use instead |
|---|---|
audio/mp3 | audio/mpeg |
audio/pcm | audio/L16 |
audio/x-mulaw | audio/PCMU |
Variable Parameters
Must be one of the voices listed in our documentation.
The text you’d like spoken. Character limit per request is 500 via the API and 1,000 in the dashboard UI.
Set to
mistv2.If provided, the language must match the language spoken by the provided speaker. This can be checked in our voices documentation.
When set to true, adds pauses between words enclosed in angle brackets. The number inside the brackets specifies the pause duration in milliseconds.
Example: “Hi. <200> I’d love to have a conversation with you.” adds a 200ms pause between the first and second sentences.
Example: “Hi. <200> I’d love to have a conversation with you.” adds a 200ms pause between the first and second sentences.
When set to true, you can specify the phonemes for a word enclosed in curly brackets.
Example: “{h’El.o} World” will pronounce “Hello” as expected. Learn more about custom pronunciation.
Example: “{h’El.o} World” will pronounce “Hello” as expected. Learn more about custom pronunciation.
Comma-separated list of speed values applied to words in square brackets. Values < 1.0 speed up speech, > 1.0 slow it down.
Example: “This is [slow] and [fast]”, use “3, 0.5” to make “slow” slower and “fast” faster.
The value, if provided, must be between 4000 and 44100. Default depends on format (see table above).
Adjusts the speed of speech. Lower than 1.0 is faster and higher than 1.0 is slower.Note: this is the legacy Mist v2 convention. Newer models (Coda, Arcana, Mist v3) invert it — for those, higher than 1.0 is faster.
mist/mistv2 only. Skips text normalization of the input text prior to synthesizing audio. This will reduce latency at the cost of some possible mispronunciation of digits and abbreviations.
mist/mistv2 only. If set to
true, Rime shall save any currently OOV (out-of-vocabulary) words encountered in text, and save them for the User or Team to review on the
Speech QA dashboard. Note: It may take up to 15 minutes for OOV words to appear on your dashboard.
