Metrics

Keeping track of system metrics is crucial for maintaining a stable Rime self-hosted deployment. These insights help guide decisions about scaling and performance. To support this, Rime services expose multiple endpoints that let you monitor system health.

API container

A /health route is available on port 8000 to give you a quick snapshot of the overall status. You can check it with the following command:

curl -X GET "http://localhost:8000/health"

A typical response looks like this:

{
  "apiStatus": "ok",
  "timestamp": "2025-04-03T13:59:58.902Z",
  "licenseStatus": "valid",
  "modelReachable": true
}

This provides a simple health check mechanism to verify that both API and model services are up and responding.

Arcana container

Healthcheck

To check if the model is running properly, you can perform a liveness probe using the /readyz endpoint:

curl -X GET "http://localhost:8080/readyz"

A typical response looks like this:

ok

The model is capable of emitting OpenTelemetry metrics. The internal vLLM process also exposes a Prometheus /metrics endpoint. Both can serve as sources of data for metrics collection, tuning, and debugging.

OpenTelemetry metrics

The Rime engine exposes the following OpenTelemetry metrics:

rime.engine.concurrent_pipeline
rime.engine.generated_audio_duration
rime.engine.gpu_load
rime.engine.initial_latency
rime.engine.invocation_request

To have the model emit OpenTelemetry metrics, set two environment variables …

… set OTEL_COLLECTOR_PROTOCOL to grpc (recommended), http/protobuf, or http/json.
… provide a valid OpenTelemetry Collector endpoint (through OTEL_COLLECTOR_ENDPOINT).

e.g.;

OTEL_COLLECTOR_PROTOCOL='grpc'
OTEL_COLLECTOR_ENDPOINT="https://${OTELCOL_DOMAIN:?}:4317"

The model does not set any resource attributes for these metrics. OTEL_RESOURCE_ATTRIBUTES may be used to define them. The model does not implement OpenTelemetry authentication. If authentication is desired, running an OpenTelemetry Collector sidecar to forward metrics is recommended.

Prometheus metrics

To retrieve vLLM Prometheus metrics, issue an HTTP request to http://localhost:${GENERATOR_VLLM_PORT:?}/metrics. By default, a random port is assigned to the vLLM server. To choose a fixed port (which would be convenient for forwarding these metrics, if desired), set the environment variable GENERATOR_VLLM_PORT to an available port number.

OpenTelemetry Collector sidecar

If authentication, or other OpenTelemetry Collector extensions, are desired, be sure to utilize the contrib distribution that includes them.

An OpenTelemetry Collector may be optionally used to forward both the OpenTelemetry and Prometheus metrics. It can also be used to conveniently retrieve the OpenTelemetry metrics locally. For example, Docker Compose can be used to run the model and an OpenTelemetry Collector …

services:
  model:
    image: us-docker.pkg.dev/rime-labs/arcana/v2:20251206
    # ...
    environment:
      # ...
      "GENERATOR_VLLM_PORT": 30000
      "OTEL_COLLECTOR_PROTOCOL": "grpc"
      "OTEL_COLLECTOR_ENDPOINT": "http://otelcol:4317"
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.139.0
    ports:
      # ...
      - "9090:9090" # Prometheus exporter, for local retrieval on a /metrics endpoint
    volumes:
      - "./otelcol/config.yaml:/etc/otelcol-contrib/config.yaml:ro"
      # ...

… where otelcol/config.yaml would be the configuration file for the OpenTelemetry Collector.

receivers:
  # Provide endpoints for OpenTelemetry metrics from the model
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  # Scrape the vLLM Prometheus /metrics endpoint
  prometheus/generator-vllm:
    config:
      global:
        scrape_interval: "1s"
      scrape_configs:
        - job_name: "generator-vllm"
          static_configs:
            # Assuming GENERATOR_VLLM_PORT=30000
            - targets: ["model:30000"]
exporters:
  otlp:
    # Consider using the authenticator extension if authentication is required
    endpoint: "another.opentelemetry.collector.example:4317"
  prometheus:
    endpoint: "0.0.0.0:9090"
service:
  pipelines:
    # Forward the OpenTelemetry and Prometheus metrics to another OpenTelemetry Collector endpoint
    metrics/otlp:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [otlp]
    # Expose the OpenTelemetry and Prometheus metrics locally with another Prometheus /metrics endpoint
    metrics/local:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [prometheus]

Histogram tuning

Metrics that emit histogram data utilize pre-configured bucket boundaries. The Rime engine prescribes its own defaults for some metrics. These may also be configurable/tunable through the use of HISTOGRAM_BUCKETS_* environment variables. To separate differently-tuned sets of data across individual metrics, a suffix may be attached to the original metric name using HISTOGRAM_SUFFIX_* environment variables.

rime.engine.initial_latency

Measured in milliseconds.

Default bounds:

0.0, 100.0, 150.0, 200.0, 225.0, 250.0, 275.0, 300.0, 325.0, 350.0, 375.0, 400.0, 450.0, 500.0, 1000.0

Set alternative bounds:

HISTOGRAM_BUCKETS_INITIAL_LATENCY_MS='0.0, 90.0, 95.0, 100.0, 105.0, 110.0, 115.0, 120.0, 125.0, 130.0, 135.0, 140.0, 150.0, 175.0, 200.0'

Set alternative metric name suffix:
- HISTOGRAM_SUFFIX_INITIAL_LATENCY='tuned_120ms' produces a metric named rime.engine.initial_latency.tuned_120ms

rime.engine.generated_audio_duration
- Measured in seconds.
- Default bounds:
  - 0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.5, 10.0, 20.0, 30.0, 60.0
- Set alternative bounds:
  - HISTOGRAM_BUCKETS_GENERATED_AUDIO_DURATION_SECONDS='0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.5, 10.0, 20.0, 30.0, 60.0'
- Set alternative metric name suffix:
  - HISTOGRAM_SUFFIX_GENERATED_AUDIO_DURATION='tuned_3s' produces a metric named rime.engine.generated_audio_duration.tuned_3s

Mist container

Healthcheck

To check if the model is running properly, you can perform a liveness probe using the /ping endpoint:

curl -X GET "http://localhost:8080/ping"

A typical response looks like this:

pong

Metrics

For more detailed operational insights, the model service exposes Prometheus-compatible metrics at the /metrics endpoint:

curl -X GET "http://localhost:8080/metrics"

This endpoint provides telemetry data, including:

HTTP request counters: Detailed breakdown of requests by endpoint, status code, and HTTP method
Error tracking: Counts of HTTP errors by type and status code

Example metrics include:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{endpoint="/invocations",http_status="200",method="POST"} 102018.0
http_requests_total{endpoint="/metrics",http_status="200",method="GET"} 69908.0
http_requests_total{endpoint="/invocations",http_status="500",method="POST"} 38.0

# HELP http_errors_total Total number of HTTP errors
# TYPE http_errors_total counter
http_errors_total{endpoint="/invocations",error_message="cannot access local variable '_var_var_6' where it is not associated with a value",http_status="500",method="POST"} 38.0

These metrics can be integrated with Prometheus monitoring systems to create dashboards and alerts for your Rime deployment.

Introduction

Getting started

Documentation

Metrics

API container

Arcana container

Healthcheck

Metrics

OpenTelemetry metrics

Prometheus metrics

OpenTelemetry Collector sidecar

Histogram tuning

Mist container

Healthcheck

Metrics

Introduction

Getting started

Documentation

​API container

​Arcana container

​Healthcheck

​Metrics

​OpenTelemetry metrics

​Prometheus metrics

​OpenTelemetry Collector sidecar

​Histogram tuning

​Mist container

​Healthcheck

​Metrics

API container

Arcana container

Healthcheck

Metrics

OpenTelemetry metrics

Prometheus metrics

OpenTelemetry Collector sidecar

Histogram tuning

Mist container

Healthcheck

Metrics