Pre-normalizing text

Most applications shouldn’t pre-normalize. Rime handles common patterns — currency with symbols, dates with years, clock times, phone numbers, standard measurements, and most numeric formats — natively. Pre-normalizing adds latency and engineering complexity for behavior you usually get for free. Before adding a pre-processing layer, review the formats Rime handles natively and verify any specific input with the /textnorm endpoint.

If you’ve reviewed the native handling and still need to pre-normalize specific patterns, this page documents the known gaps and provides a prompt template you can drop into the layer that generates your text.

When pre-normalizing makes sense

There are two cases where pre-normalization in your application is worth the cost:

The input falls into a known gap. A small set of patterns aren’t reliably expanded today. Pre-expanding those specific patterns is the reliable fix.
You need guaranteed consistency across regenerations. For utterances that must read identically every time — regulatory disclosures, legal read-backs, confirmation flows — pre-normalizing removes a source of variability.

For everything else, the right default is: trust Rime’s normalizer, and use /textnorm to verify anything that sounds off. If you find a pattern Rime doesn’t handle, flag it to Rime — fixes ship on Rime’s side instead of in every customer’s pre-processing layer.

Known gaps

These patterns aren’t reliably expanded today. Either avoid them at the source, or pre-expand them in the layer that generates your text.

Dates

Month-and-year only (07/2025). Reads as a literal slash-separated number. Expand to July 2025 in prose form.
MM/DD without a year (04/21). Expand to April 21st or April twenty-first.
Centuries and financial periods (21st century, 1H 2024, Q1 2025, 1Q, 2Q). Not consistently expanded.
Cross-month date ranges (May-June 2024). Not a recognized pattern.
Decade names. 1990s currently reads as “nineteen hundreds,” not “nineteen nineties.” Use the nineteen nineties or 1990's cautiously.

Times

Bare hour with meridiem (3pm or 3 pm). Less reliable than 3:00pm. Prefer the full form with minutes when precision matters.
European 15h30 times. The h separator isn’t recognized. Use 15:30.
Suffixed approximations. 9:00-ish reads the colon literally. Avoid or pre-normalize.

Numbers

Very long comma-separated numbers (10,000,000) sometimes fall back to digit-by-digit. Drop the commas (10000000) or use shorthand (10M).
Superscript exponents (10⁶) aren’t recognized. Use 1e6 or 10^6.
Very large ordinals (1,000,000th) fall back to digit-by-digit. Prefer writing out (one millionth).

Money

Scale-word ordering with non-dollar currencies (€900K can read as “nine hundred euros thousand”). Spell out: 900 thousand euros.
Country-plus-symbol prefixes (US$3B, HK$1,152,415, AUD$900K) aren’t always fully verbalized.
Negative amounts (-$50, −€100) aren’t well-tested. Prefer minus fifty dollars.

Measurements

m ambiguity (meter vs. million). 7m/s may read as “seven million per second.” Prefer 7 m/s with a space, or spell out the unit.
Technical / electrical units (kWh, A, V, J) sometimes pass through unexpanded. Spell out (6 kilowatt-hours) if verbalization matters.
Uncommon units (qt, btu, psi, dyne, ‰) are not reliably expanded.
Parenthesized compound units (5(kg/m²)) aren’t a recognized form.

Ranges

Inconsistent connector. Hyphen ranges are usually read as “to,” but in some contexts the hyphen is read literally. If consistency matters, use the word “to” directly.

Roman numerals

Isolated Roman numerals without a context word pass through unchanged. Use spell() to force letter-by-letter, or write out the number.
Mixed-format numbering (section 2.IV.3) isn’t handled.

Phone numbers

Vanity numbers (1-800-FLOWERS) keep the letters as a literal string. Use spell(FLOWERS) if you need letter-by-letter.
Extensions (555-1234 ext. 567, 555-1234 x 567) aren’t explicitly handled.

Miscellaneous

Context-dependent abbreviations (Dr., Mr., St.) rely on context and may not always resolve as expected. See Abbreviations.
Repeating decimals (0.3̄, 0.(3)) aren’t recognized.
Internationalized domain names (non-ASCII in URLs) aren’t supported.

Pre-normalization prompt template

If your application uses an LLM to generate text before sending it to Rime, you can add the rules below to its system prompt or run them as a post-processing pass. The template targets only the known gaps — Rime’s normalizer handles the rest, so there’s no need to pre-expand everything.

System prompt template

TEXT-TO-SPEECH NORMALIZATION RULES

Before returning your response, rewrite any of the following patterns. The text
will be sent to a text-to-speech engine, and these specific patterns are known
not to verbalize correctly on the engine side. Leave all other text alone.

1. DATES WITHOUT A YEAR. Expand MM/DD dates to month name + ordinal day.
   04/21 -> "April 21st"
   08/30 -> "August 30th"
   Full dates with a year (01/12/2026) do not need rewriting.

2. MONTH-AND-YEAR ALONE. Expand MM/YYYY to month name + year.
   07/2025 -> "July 2025"

3. BARE HOURS WITH MERIDIEM. Insert ":00" for precision.
   3pm  -> "3:00pm"
   3 pm -> "3:00pm"
   Clock times with minutes (7:05 PM) do not need rewriting.

4. DECADE NAMES. Rewrite "1990s" as "the nineteen nineties" (or the
   relevant decade) to avoid it reading as "nineteen hundreds."

5. FINANCIAL PERIODS AND CENTURIES. Expand to prose.
   Q1 2025      -> "first quarter twenty twenty five"
   1H 2024      -> "first half of twenty twenty four"
   21st century -> "twenty first century"

6. NON-DOLLAR CURRENCY SHORTHAND. Spell out the scale word.
   €900K -> "900 thousand euros"
   £2M   -> "2 million pounds"
   Dollar shorthand ($5M, $1.2B) reads correctly and does not need rewriting.

7. ALPHANUMERIC IDs, codes, confirmation numbers, SKUs, tracking numbers,
   vanity phone numbers. Wrap the identifier in spell(). Do not otherwise
   alter it.
   "Your confirmation is ABC123XYZ" -> "Your confirmation is spell(ABC123XYZ)."
   "Call 1-800-FLOWERS" -> "Call 1-800-spell(FLOWERS)."

8. VERY LONG COMMA-SEPARATED NUMBERS. For numbers like 10,000,000, either
   drop the commas (10000000) or use scale shorthand (10M).

9. PROSODY. Use commas for short pauses inside sentences and periods for
   sentence boundaries. Keep sentences under about 25 words when possible.
   Do not use dashes inside numbers, IDs, or phone numbers; they cause
   unwanted pauses.

10. NEVER invent, drop, or reorder information while rewriting. Preserve every
    digit, letter, and symbol from the source; only change the surface form
    for patterns listed above. Currency with symbols ($124.50), standard
    measurements (5kg, 98°F), phone numbers with punctuation ((213) 555-9274),
    and percentages (95%) should pass through unchanged.

Apply these rules silently. Do not mention them in your output.

Start here

Getting started

Voice agents

Voices & models

Customizing speech

Streaming & performance

Integrations

On-prem

Account & updates

Pre-normalizing text

When pre-normalizing makes sense

Known gaps

Dates

Times

Numbers

Money

Measurements

Ranges

Roman numerals

Phone numbers

Miscellaneous

Pre-normalization prompt template

​When pre-normalizing makes sense

​Known gaps

​Dates

​Times

​Numbers

​Money

​Measurements

​Ranges

​Roman numerals

​Phone numbers

​Miscellaneous

​Pre-normalization prompt template

When pre-normalizing makes sense

Known gaps

Dates

Times

Numbers

Money

Measurements

Ranges

Roman numerals

Phone numbers

Miscellaneous

Pre-normalization prompt template