API Container
A/health route is available on port 8000 to give you a quick snapshot of overall status. You can check it with the following command:
Arcana Container
Healthcheck
To check if the model is running properly, you can perform a liveness probe using the/readyz endpoint:
Metrics
The model is capable of emitting OpenTelemetry metrics. The internal vLLM process also exposes a Prometheus/metrics endpoint.
Both can serve as sources of data for metrics collection, tuning, and debugging.
OpenTelemetry metrics
The Rime engine exposes the following OpenTelemetry metrics:rime.engine.concurrent_pipelinerime.engine.generated_audio_durationrime.engine.gpu_loadrime.engine.initial_latencyrime.engine.invocation_request
- … set
OTEL_COLLECTOR_PROTOCOLtogrpc(recommended),http/protobuf, orhttp/json. - … provide a valid OpenTelemetry Collector endpoint (through
OTEL_COLLECTOR_ENDPOINT).
OTEL_RESOURCE_ATTRIBUTES may be used to define them.
The model does not implement OpenTelemetry authentication. If authentication is desired, running an OpenTelemetry Collector sidecar to forward metrics is recommended.
Prometheus metrics
To retrieve vLLM Prometheus metrics, issue a HTTP request tohttp://localhost:${GENERATOR_VLLM_PORT:?}/metrics.
By default, a random port is assigned to the vLLM server. To choose a fixed port (which would be convenient for forwarding these metrics, if desired), set the environment variable GENERATOR_VLLM_PORT to an available port number.
OpenTelemetry Collector sidecar
If authentication, or other OpenTelemetry Collector extensions, are desired, be sure to utilize the contrib distribution which includes them.
An OpenTelemetry Collector may be optionally used to forward both the OpenTelemetry and Prometheus metrics. It also can be used to conveniently retrieve the OpenTelemetry metrics locally.
For example, Docker Compose can be used to run the model and an OpenTelemetry Collector …
otelcol/config.yaml would be the configuration file for the OpenTelemetry Collector.
Mist Container
Healthcheck
To check if the model is running properly, you can perform a liveness probe using the/ping endpoint:
Metrics
For more detailed operational insights, the model service exposes Prometheus-compatible metrics at the /metrics endpoint:- HTTP Request Counters: Detailed breakdown of requests by endpoint, status code, and HTTP method
- Error Tracking: Counts of HTTP errors by type and status code

