API container
A/health route is available on port 8000 to give you a quick snapshot of the overall status. You can check it with the following command:
Arcana container
Healthcheck
To check if the model is running properly, you can perform a liveness probe using the/readyz endpoint:
Metrics
The model is capable of emitting OpenTelemetry metrics. The internal vLLM process also exposes a Prometheus/metrics endpoint.
Both can serve as sources of data for metrics collection, tuning, and debugging.
OpenTelemetry metrics
The Rime engine exposes the following OpenTelemetry metrics:rime.engine.concurrent_pipelinerime.engine.generated_audio_durationrime.engine.gpu_loadrime.engine.initial_latencyrime.engine.invocation_request
- … set
OTEL_COLLECTOR_PROTOCOLtogrpc(recommended),http/protobuf, orhttp/json. - … provide a valid OpenTelemetry Collector endpoint (through
OTEL_COLLECTOR_ENDPOINT).
OTEL_RESOURCE_ATTRIBUTES may be used to define them.
The model does not implement OpenTelemetry authentication. If authentication is desired, running an OpenTelemetry Collector sidecar to forward metrics is recommended.
Prometheus metrics
To retrieve vLLM Prometheus metrics, issue an HTTP request tohttp://localhost:${GENERATOR_VLLM_PORT:?}/metrics.
By default, a random port is assigned to the vLLM server. To choose a fixed port (which would be convenient for forwarding these metrics, if desired), set the environment variable GENERATOR_VLLM_PORT to an available port number.
OpenTelemetry Collector sidecar
If authentication, or other OpenTelemetry Collector extensions, are desired, be sure to utilize the contrib distribution that includes them.
An OpenTelemetry Collector may be optionally used to forward both the OpenTelemetry and Prometheus metrics. It can also be used to conveniently retrieve the OpenTelemetry metrics locally.
For example, Docker Compose can be used to run the model and an OpenTelemetry Collector …
otelcol/config.yaml would be the configuration file for the OpenTelemetry Collector.
Histogram tuning
Metrics that emit histogram data utilize pre-configured bucket boundaries. The Rime engine prescribes its own defaults for some metrics. These may also be configurable/tunable through the use ofHISTOGRAM_BUCKETS_* environment variables.
To separate differently-tuned sets of data across individual metrics, a suffix may be attached to the original metric name using HISTOGRAM_SUFFIX_* environment variables.
-
rime.engine.initial_latency- Measured in milliseconds.
- Default bounds:
-
- Set alternative bounds:
-
- Set alternative metric name suffix:
HISTOGRAM_SUFFIX_INITIAL_LATENCY='tuned_120ms'produces a metric namedrime.engine.initial_latency.tuned_120ms
-
rime.engine.generated_audio_duration- Measured in seconds.
- Default bounds:
-
- Set alternative bounds:
-
- Set alternative metric name suffix:
HISTOGRAM_SUFFIX_GENERATED_AUDIO_DURATION='tuned_3s'produces a metric namedrime.engine.generated_audio_duration.tuned_3s
Mist container
Healthcheck
To check if the model is running properly, you can perform a liveness probe using the/ping endpoint:
Metrics
For more detailed operational insights, the model service exposes Prometheus-compatible metrics at the /metrics endpoint:- HTTP request counters: Detailed breakdown of requests by endpoint, status code, and HTTP method
- Error tracking: Counts of HTTP errors by type and status code

