> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Pre-normalizing text

> Optional patterns you can pre-expand in your application before sending text to Rime. Use sparingly — Rime's normalizer handles most common formats natively without any preprocessing.

<Warning>
  **Most applications shouldn't pre-normalize.** Rime handles common patterns — currency with symbols, dates with years, clock times, phone numbers, standard measurements, and most numeric formats — natively. Pre-normalizing adds latency and engineering complexity for behavior you usually get for free. Before adding a pre-processing layer, review the [formats Rime handles natively](/docs/text-normalization#handled-at-a-glance) and verify any specific input with the [`/textnorm` endpoint](/docs/text-normalization#debugging-with-the-textnorm-endpoint).
</Warning>

If you've reviewed the native handling and still need to pre-normalize specific patterns, this page documents the known gaps and provides a prompt template you can drop into the layer that generates your text.

***

## When pre-normalizing makes sense

There are two cases where pre-normalization in your application is worth the cost:

1. **The input falls into a [known gap](#known-gaps).** A small set of patterns aren't reliably expanded today. Pre-expanding those specific patterns is the reliable fix.
2. **You need guaranteed consistency across regenerations.** For utterances that must read identically every time — regulatory disclosures, legal read-backs, confirmation flows — pre-normalizing removes a source of variability.

For everything else, the right default is: trust Rime's normalizer, and use [`/textnorm`](/docs/text-normalization#debugging-with-the-textnorm-endpoint) to verify anything that sounds off. If you find a pattern Rime doesn't handle, [flag it to Rime](mailto:support@rime.ai) — fixes ship on Rime's side instead of in every customer's pre-processing layer.

***

## Known gaps

These patterns aren't reliably expanded today. Either avoid them at the source, or pre-expand them in the layer that generates your text.

### Dates

* **Month-and-year only** (`07/2025`). Reads as a literal slash-separated number. Expand to `July 2025` in prose form.
* **MM/DD without a year** (`04/21`). Expand to `April 21st` or `April twenty-first`.
* **Centuries and financial periods** (`21st century`, `1H 2024`, `Q1 2025`, `1Q`, `2Q`). Not consistently expanded.
* **Cross-month date ranges** (`May-June 2024`). Not a recognized pattern.
* **Decade names**. `1990s` currently reads as "nineteen hundreds," not "nineteen nineties." Use `the nineteen nineties` or `1990's` cautiously.

### Times

* **Bare hour with meridiem** (`3pm` or `3 pm`). Less reliable than `3:00pm`. Prefer the full form with minutes when precision matters.
* **European `15h30` times.** The `h` separator isn't recognized. Use `15:30`.
* **Suffixed approximations.** `9:00-ish` reads the colon literally. Avoid or pre-normalize.

### Numbers

* **Very long comma-separated numbers** (`10,000,000`) sometimes fall back to digit-by-digit. Drop the commas (`10000000`) or use shorthand (`10M`).
* **Superscript exponents** (`10⁶`) aren't recognized. Use `1e6` or `10^6`.
* **Very large ordinals** (`1,000,000th`) fall back to digit-by-digit. Prefer writing out (`one millionth`).

### Money

* **Scale-word ordering with non-dollar currencies** (`€900K` can read as "nine hundred euros thousand"). Spell out: `900 thousand euros`.
* **Country-plus-symbol prefixes** (`US$3B`, `HK$1,152,415`, `AUD$900K`) aren't always fully verbalized.
* **Negative amounts** (`-$50`, `−€100`) aren't well-tested. Prefer `minus fifty dollars`.

### Measurements

* **`m` ambiguity (meter vs. million).** `7m/s` may read as "seven million per second." Prefer `7 m/s` with a space, or spell out the unit.
* **Technical / electrical units** (`kWh`, `A`, `V`, `J`) sometimes pass through unexpanded. Spell out (`6 kilowatt-hours`) if verbalization matters.
* **Uncommon units** (`qt`, `btu`, `psi`, `dyne`, `‰`) are not reliably expanded.
* **Parenthesized compound units** (`5(kg/m²)`) aren't a recognized form.

### Ranges

* **Inconsistent connector.** Hyphen ranges are usually read as "to," but in some contexts the hyphen is read literally. If consistency matters, use the word "to" directly.

### Roman numerals

* **Isolated Roman numerals** without a context word pass through unchanged. Use [`spell()`](/docs/spell) to force letter-by-letter, or write out the number.
* **Mixed-format numbering** (`section 2.IV.3`) isn't handled.

### Phone numbers

* **Vanity numbers** (`1-800-FLOWERS`) keep the letters as a literal string. Use `spell(FLOWERS)` if you need letter-by-letter.
* **Extensions** (`555-1234 ext. 567`, `555-1234 x 567`) aren't explicitly handled.

### Miscellaneous

* **Context-dependent abbreviations** (`Dr.`, `Mr.`, `St.`) rely on context and may not always resolve as expected. See [Abbreviations](/docs/abbreviations).
* **Repeating decimals** (`0.3̄`, `0.(3)`) aren't recognized.
* **Internationalized domain names** (non-ASCII in URLs) aren't supported.

***

## Pre-normalization prompt template

If your application uses an LLM to generate text before sending it to Rime, you can add the rules below to its system prompt or run them as a post-processing pass. The template targets only the [known gaps](#known-gaps) — Rime's normalizer handles the rest, so there's no need to pre-expand everything.

<Accordion title="System prompt template">
  ```text theme={null}
  TEXT-TO-SPEECH NORMALIZATION RULES

  Before returning your response, rewrite any of the following patterns. The text
  will be sent to a text-to-speech engine, and these specific patterns are known
  not to verbalize correctly on the engine side. Leave all other text alone.

  1. DATES WITHOUT A YEAR. Expand MM/DD dates to month name + ordinal day.
     04/21 -> "April 21st"
     08/30 -> "August 30th"
     Full dates with a year (01/12/2026) do not need rewriting.

  2. MONTH-AND-YEAR ALONE. Expand MM/YYYY to month name + year.
     07/2025 -> "July 2025"

  3. BARE HOURS WITH MERIDIEM. Insert ":00" for precision.
     3pm  -> "3:00pm"
     3 pm -> "3:00pm"
     Clock times with minutes (7:05 PM) do not need rewriting.

  4. DECADE NAMES. Rewrite "1990s" as "the nineteen nineties" (or the
     relevant decade) to avoid it reading as "nineteen hundreds."

  5. FINANCIAL PERIODS AND CENTURIES. Expand to prose.
     Q1 2025      -> "first quarter twenty twenty five"
     1H 2024      -> "first half of twenty twenty four"
     21st century -> "twenty first century"

  6. NON-DOLLAR CURRENCY SHORTHAND. Spell out the scale word.
     €900K -> "900 thousand euros"
     £2M   -> "2 million pounds"
     Dollar shorthand ($5M, $1.2B) reads correctly and does not need rewriting.

  7. ALPHANUMERIC IDs, codes, confirmation numbers, SKUs, tracking numbers,
     vanity phone numbers. Wrap the identifier in spell(). Do not otherwise
     alter it.
     "Your confirmation is ABC123XYZ" -> "Your confirmation is spell(ABC123XYZ)."
     "Call 1-800-FLOWERS" -> "Call 1-800-spell(FLOWERS)."

  8. VERY LONG COMMA-SEPARATED NUMBERS. For numbers like 10,000,000, either
     drop the commas (10000000) or use scale shorthand (10M).

  9. PROSODY. Use commas for short pauses inside sentences and periods for
     sentence boundaries. Keep sentences under about 25 words when possible.
     Do not use dashes inside numbers, IDs, or phone numbers; they cause
     unwanted pauses.

  10. NEVER invent, drop, or reorder information while rewriting. Preserve every
      digit, letter, and symbol from the source; only change the surface form
      for patterns listed above. Currency with symbols ($124.50), standard
      measurements (5kg, 98°F), phone numbers with punctuation ((213) 555-9274),
      and percentages (95%) should pass through unchanged.

  Apply these rules silently. Do not mention them in your output.
  ```
</Accordion>
