Skip to main content
Keeping track of system metrics is crucial for maintaining a stable Rime self-hosted deployment. These insights help guide decisions about scaling and performance. To support this, Rime services expose multiple endpoints that let you monitor system health.

API Container

A /health route is available on port 8000 to give you a quick snapshot of overall status. You can check it with the following command:
curl -X GET "http://localhost:8000/health"
A typical response looks like this:
{
  "apiStatus": "ok",
  "timestamp": "2025-04-03T13:59:58.902Z",
  "licenseStatus": "valid",
  "modelReachable": true
}
This provides a simple health check mechanism to verify that both api and model services are up and responding.

Arcana Container

Healthcheck

To check if the model is running properly, you can perform a liveness probe using the /readyz endpoint:
curl -X GET "http://localhost:8080/readyz"
A typical response looks like this:
ok

Metrics

The model is capable of emitting OpenTelemetry metrics. The internal vLLM process also exposes a Prometheus /metrics endpoint. Both can serve as sources of data for metrics collection, tuning, and debugging.

OpenTelemetry metrics

The Rime engine exposes the following OpenTelemetry metrics:
  • rime.engine.concurrent_pipeline
  • rime.engine.generated_audio_duration
  • rime.engine.gpu_load
  • rime.engine.initial_latency
  • rime.engine.invocation_request
To have the model emit OpenTelemetry metrics, set two environment variables …
  • … set OTEL_COLLECTOR_PROTOCOL to grpc (recommended), http/protobuf, or http/json.
  • … provide a valid OpenTelemetry Collector endpoint (through OTEL_COLLECTOR_ENDPOINT).
e.g.;
OTEL_COLLECTOR_PROTOCOL='grpc'
OTEL_COLLECTOR_ENDPOINT="https://${OTELCOL_DOMAIN:?}:4317"
The model does not set any resource attributes for these metrics. OTEL_RESOURCE_ATTRIBUTES may be used to define them. The model does not implement OpenTelemetry authentication. If authentication is desired, running an OpenTelemetry Collector sidecar to forward metrics is recommended.

Prometheus metrics

To retrieve vLLM Prometheus metrics, issue a HTTP request to http://localhost:${GENERATOR_VLLM_PORT:?}/metrics. By default, a random port is assigned to the vLLM server. To choose a fixed port (which would be convenient for forwarding these metrics, if desired), set the environment variable GENERATOR_VLLM_PORT to an available port number.

OpenTelemetry Collector sidecar

If authentication, or other OpenTelemetry Collector extensions, are desired, be sure to utilize the contrib distribution which includes them.
An OpenTelemetry Collector may be optionally used to forward both the OpenTelemetry and Prometheus metrics. It also can be used to conveniently retrieve the OpenTelemetry metrics locally. For example, Docker Compose can be used to run the model and an OpenTelemetry Collector …
services:
  model:
    image: us-docker.pkg.dev/rime-labs/arcana/v2:20251206
    # ...
    environment:
      # ...
      "GENERATOR_VLLM_PORT": 30000
      "OTEL_COLLECTOR_PROTOCOL": "grpc"
      "OTEL_COLLECTOR_ENDPOINT": "http://otelcol:4317"
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.139.0
    ports:
      # ...
      - "9090:9090" # Prometheus exporter, for local retrieval on a /metrics endpoint
    volumes:
      - "./otelcol/config.yaml:/etc/otelcol-contrib/config.yaml:ro"
      # ...
… where otelcol/config.yaml would be the configuration file for the OpenTelemetry Collector.
receivers:
  # Provide endpoints for OpenTelemetry metrics from the model
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  # Scrape the vLLM Prometheus /metrics endpoint
  prometheus/generator-vllm:
    config:
      global:
        scrape_interval: "1s"
      scrape_configs:
        - job_name: "generator-vllm"
          static_configs:
            # Assuming GENERATOR_VLLM_PORT=30000
            - targets: ["model:30000"]
exporters:
  otlp:
    # Consider using the authenticator extension if authentication is required
    endpoint: "another.opentelemetry.collector.example:4317"
  prometheus:
    endpoint: "0.0.0.0:9090"
service:
  pipelines:
    # Forward the OpenTelemetry and Prometheus metrics to another OpenTelemetry Collector endpoint
    metrics/otlp:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [otlp]
    # Expose the OpenTelemetry and Prometheus metrics locally with another Prometheus /metrics endpoint
    metrics/local:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [prometheus]

Mist Container

Healthcheck

To check if the model is running properly, you can perform a liveness probe using the /ping endpoint:
curl -X GET "http://localhost:8080/ping"
A typical response looks like this:
pong

Metrics

For more detailed operational insights, the model service exposes Prometheus-compatible metrics at the /metrics endpoint:
curl -X GET "http://localhost:8080/metrics"
This endpoint provides telemetry data including:
  • HTTP Request Counters: Detailed breakdown of requests by endpoint, status code, and HTTP method
  • Error Tracking: Counts of HTTP errors by type and status code
Example metrics include:
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{endpoint="/invocations",http_status="200",method="POST"} 102018.0
http_requests_total{endpoint="/metrics",http_status="200",method="GET"} 69908.0
http_requests_total{endpoint="/invocations",http_status="500",method="POST"} 38.0

# HELP http_errors_total Total number of HTTP errors
# TYPE http_errors_total counter
http_errors_total{endpoint="/invocations",error_message="cannot access local variable '_var_var_6' where it is not associated with a value",http_status="500",method="POST"} 38.0
These metrics can be integrated with Prometheus monitoring systems to create dashboards and alerts for your Rime deployment.