What is segmentation?
When you stream text to Rime’s WebSocket API token by token, Rime needs to know when to synthesize audio. Should it wait for a complete sentence? Synthesize every token immediately? Wait until you say so explicitly? Thesegment query parameter answers this question. It controls how Rime buffers the text you send and when it fires off synthesis to the model backend.
| Setting | Default? | Best for |
|---|---|---|
segment=never | No (recommended) | Full control — you decide when to synthesize |
segment=bySentence | Yes | Sentence-structured text streamed token by token |
segment=immediate | No | Pre-segmented phrases or real-time pass-through |
All three settings are available on
/ws3 and /ws2. The segment parameter is not applicable to /ws (binary WebSocket).segment=never — Recommended
How it works
Undersegment=never, Rime never synthesizes audio automatically. It accumulates every token you send into a buffer and waits. Audio is only produced when your client explicitly sends a flush operation.
What you are responsible for
-
Sending well-formed, concatenable tokens. If someone concatenated every token you send, the result should be a properly spaced, punctuated utterance. For example:
Avoid sending tokens that, when joined, produce malformed text:
-
Sending
flushwhen you’re done with an utterance. This is your signal to Rime that the buffer contains a complete, speakable phrase.
What Rime is responsible for
- Synthesizing audio whenever a
flushis received. - Queuing the buffer for synthesis if the previous utterance is still being produced.
- Never synthesizing mid-stream without your explicit instruction.
Example
Handling interruptions
If your user interrupts the assistant while audio is playing, send aclear operation to discard the buffer and stop queued synthesis:
segment=bySentence — Default
This is the default behavior when segment is not specified. Rime buffers tokens and synthesizes audio each time it detects a sentence or phrase boundary in the accumulated text.
How it works
Rime watches the incoming token stream for sentence-ending punctuation:., ?, !. When one is encountered and no audio is currently being synthesized, Rime synthesizes everything up to that boundary and sends the audio back.
What you are responsible for
- Separating sentences with spaces. Tokens sent without trailing spaces can cause words to run together after concatenation.
-
Not splitting tokens at sentence-ending punctuation. Rime’s heuristics fire on received tokens. If a single token ends with sentence-ending punctuation in the middle of what should be a larger phrase (e.g.,
"2."in"2.5ml"), it may trigger an early synthesis.
What Rime is responsible for
- Accumulating tokens until a sentence boundary is detected.
- Synthesizing the buffer at that boundary, only if no audio is currently being produced.
- Using heuristics to determine whether a given punctuation mark constitutes a sentence end.
When to use this
segment=bySentence works well when you’re streaming text that is already well-structured — clean sentence boundaries, no numbers or abbreviations that could confuse the period heuristic. It requires less client-side coordination than segment=never but is less predictable in edge cases.
Example
segment=immediate — Synthesize on receipt
Under segment=immediate, Rime synthesizes audio as soon as text arrives in the buffer — provided no audio is currently being produced.
How it works
Each time Rime receives a text message and the synthesis pipeline is idle, it synthesizes whatever is in the buffer immediately. If synthesis is already in progress, incoming tokens continue to accumulate. Once the current synthesis finishes, Rime flushes the entire accumulated buffer as a single utterance.What you are responsible for
- Sending complete, speakable phrases. Because Rime may synthesize on the very first token it receives, each message — or the concatenation of messages received while synthesis is busy — should form something that sounds natural when spoken on its own.
-
Ensuring concatenated tokens are properly spaced and formatted. When multiple tokens arrive during an active synthesis, they’ll be joined and synthesized together. The same concatenation rules apply as in
segment=never.
What Rime is responsible for
- Synthesizing immediately upon receiving text when the pipeline is idle.
- Accumulating tokens while synthesis is active, then synthesizing the full buffer once idle again.
When to use this
segment=immediate is useful when your client is sending pre-segmented phrases that are already complete utterances — for example, if you’re controlling segmentation on your side and just want Rime to synthesize each phrase as fast as possible without any punctuation-based logic.
Example
Summary: which setting should I use?
Usesegment=never if:
- You’re building a voice agent or conversational AI application.
- You’re streaming LLM output token by token and want full control over when synthesis fires.
- You want deterministic, predictable behavior without relying on punctuation heuristics.
segment=bySentence if:
- Your text is well-formed prose with clean sentence boundaries.
- You want Rime to handle segmentation automatically without sending
flush. - You’re prototyping and simplicity matters more than precision.
segment=immediate if:
- You’re sending pre-segmented, complete phrases from your own logic.
- You want the fastest possible synthesis start for each phrase.
- You’re not streaming token by token — you’re sending complete utterances at once.

