When pre-normalizing makes sense
There are two cases where pre-normalization in your application is worth the cost:- The input falls into a known gap. A small set of patterns aren’t reliably expanded today. Pre-expanding those specific patterns is the reliable fix.
- You need guaranteed consistency across regenerations. For utterances that must read identically every time — regulatory disclosures, legal read-backs, confirmation flows — pre-normalizing removes a source of variability.
/textnorm to verify anything that sounds off. If you find a pattern Rime doesn’t handle, flag it to Rime — fixes ship on Rime’s side instead of in every customer’s pre-processing layer.
Known gaps
These patterns aren’t reliably expanded today. Either avoid them at the source, or pre-expand them in the layer that generates your text.Dates
- Month-and-year only (
07/2025). Reads as a literal slash-separated number. Expand toJuly 2025in prose form. - MM/DD without a year (
04/21). Expand toApril 21storApril twenty-first. - Centuries and financial periods (
21st century,1H 2024,Q1 2025,1Q,2Q). Not consistently expanded. - Cross-month date ranges (
May-June 2024). Not a recognized pattern. - Decade names.
1990scurrently reads as “nineteen hundreds,” not “nineteen nineties.” Usethe nineteen ninetiesor1990'scautiously.
Times
- Bare hour with meridiem (
3pmor3 pm). Less reliable than3:00pm. Prefer the full form with minutes when precision matters. - European
15h30times. Thehseparator isn’t recognized. Use15:30. - Suffixed approximations.
9:00-ishreads the colon literally. Avoid or pre-normalize.
Numbers
- Very long comma-separated numbers (
10,000,000) sometimes fall back to digit-by-digit. Drop the commas (10000000) or use shorthand (10M). - Superscript exponents (
10⁶) aren’t recognized. Use1e6or10^6. - Very large ordinals (
1,000,000th) fall back to digit-by-digit. Prefer writing out (one millionth).
Money
- Scale-word ordering with non-dollar currencies (
€900Kcan read as “nine hundred euros thousand”). Spell out:900 thousand euros. - Country-plus-symbol prefixes (
US$3B,HK$1,152,415,AUD$900K) aren’t always fully verbalized. - Negative amounts (
-$50,−€100) aren’t well-tested. Preferminus fifty dollars.
Measurements
mambiguity (meter vs. million).7m/smay read as “seven million per second.” Prefer7 m/swith a space, or spell out the unit.- Technical / electrical units (
kWh,A,V,J) sometimes pass through unexpanded. Spell out (6 kilowatt-hours) if verbalization matters. - Uncommon units (
qt,btu,psi,dyne,‰) are not reliably expanded. - Parenthesized compound units (
5(kg/m²)) aren’t a recognized form.
Ranges
- Inconsistent connector. Hyphen ranges are usually read as “to,” but in some contexts the hyphen is read literally. If consistency matters, use the word “to” directly.
Roman numerals
- Isolated Roman numerals without a context word pass through unchanged. Use
spell()to force letter-by-letter, or write out the number. - Mixed-format numbering (
section 2.IV.3) isn’t handled.
Phone numbers
- Vanity numbers (
1-800-FLOWERS) keep the letters as a literal string. Usespell(FLOWERS)if you need letter-by-letter. - Extensions (
555-1234 ext. 567,555-1234 x 567) aren’t explicitly handled.
Miscellaneous
- Context-dependent abbreviations (
Dr.,Mr.,St.) rely on context and may not always resolve as expected. See Abbreviations. - Repeating decimals (
0.3̄,0.(3)) aren’t recognized. - Internationalized domain names (non-ASCII in URLs) aren’t supported.
Pre-normalization prompt template
If your application uses an LLM to generate text before sending it to Rime, you can add the rules below to its system prompt or run them as a post-processing pass. The template targets only the known gaps — Rime’s normalizer handles the rest, so there’s no need to pre-expand everything.System prompt template
System prompt template

