> ## Documentation Index
> Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Metrics

> Observability metrics exposed by Rime on-prem containers.

Keeping track of system metrics is crucial for maintaining a stable Rime self-hosted deployment. These insights help guide decisions about scaling and performance. To support this, Rime services expose multiple endpoints that let you monitor system health.

## API container

A `/health` route is available on port 8000 to give you a quick snapshot of the overall status. You can check it with the following command:

```bash theme={null}
curl -X GET "http://localhost:8000/health"
```

A typical response looks like this:

```json theme={null}
{
  "apiStatus": "ok",
  "timestamp": "2025-04-03T13:59:58.902Z",
  "licenseStatus": "valid",
  "modelReachable": true
}
```

This provides a simple health check mechanism to verify that both API and model services are up and responding.

## Arcana container

### Healthcheck

To check if the model is running properly, you can perform a liveness probe using the `/readyz` endpoint:

```bash theme={null}
curl -X GET "http://localhost:8080/readyz"
```

A typical response looks like this:

```json theme={null}
ok
```

### Metrics

The model is capable of emitting OpenTelemetry metrics.

The internal vLLM process also exposes a Prometheus `/metrics` endpoint.

Both can serve as sources of data for metrics collection, tuning, and debugging.

#### OpenTelemetry metrics

The Rime engine exposes the following OpenTelemetry metrics:

* `rime.engine.concurrent_pipeline`
* `rime.engine.generated_audio_duration`
* `rime.engine.gpu_load`
* `rime.engine.initial_latency`
* `rime.engine.invocation_request`

To have the model emit OpenTelemetry metrics, set two environment variables ...

* ... set `OTEL_COLLECTOR_PROTOCOL` to `grpc` (recommended), `http/protobuf`, or `http/json`.
* ... provide a valid OpenTelemetry Collector endpoint (through `OTEL_COLLECTOR_ENDPOINT`).

e.g.;

```bash theme={null}
OTEL_COLLECTOR_PROTOCOL='grpc'
OTEL_COLLECTOR_ENDPOINT="https://${OTELCOL_DOMAIN:?}:4317"
```

The model does not set any resource attributes for these metrics. `OTEL_RESOURCE_ATTRIBUTES` may be used to define them.

The model does not implement OpenTelemetry authentication. If authentication is desired, running an OpenTelemetry Collector sidecar to forward metrics is recommended.

#### Prometheus metrics

To retrieve vLLM Prometheus metrics, issue an HTTP request to `http://localhost:${GENERATOR_VLLM_PORT:?}/metrics`.

By default, a random port is assigned to the vLLM server. To choose a fixed port (which would be convenient for forwarding these metrics, if desired), set the environment variable `GENERATOR_VLLM_PORT` to an available port number.

#### OpenTelemetry Collector sidecar

> If authentication, or other OpenTelemetry Collector extensions, are desired, be sure to utilize the `contrib` distribution that includes them.

An OpenTelemetry Collector may be optionally used to forward both the OpenTelemetry and Prometheus metrics. It can also be used to conveniently retrieve the OpenTelemetry metrics locally.

For example, Docker Compose can be used to run the model and an OpenTelemetry Collector ...

```yaml theme={null}
services:
  model:
    image: us-docker.pkg.dev/rime-labs/arcana/v2:20251206
    # ...
    environment:
      # ...
      "GENERATOR_VLLM_PORT": 30000
      "OTEL_COLLECTOR_PROTOCOL": "grpc"
      "OTEL_COLLECTOR_ENDPOINT": "http://otelcol:4317"
  otelcol:
    image: otel/opentelemetry-collector-contrib:0.139.0
    ports:
      # ...
      - "9090:9090" # Prometheus exporter, for local retrieval on a /metrics endpoint
    volumes:
      - "./otelcol/config.yaml:/etc/otelcol-contrib/config.yaml:ro"
      # ...
```

... where `otelcol/config.yaml` would be the [configuration file for the OpenTelemetry Collector](https://opentelemetry.io/docs/collector/configuration/#basics).

```yaml theme={null}
receivers:
  # Provide endpoints for OpenTelemetry metrics from the model
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  # Scrape the vLLM Prometheus /metrics endpoint
  prometheus/generator-vllm:
    config:
      global:
        scrape_interval: "1s"
      scrape_configs:
        - job_name: "generator-vllm"
          static_configs:
            # Assuming GENERATOR_VLLM_PORT=30000
            - targets: ["model:30000"]
exporters:
  otlp:
    # Consider using the authenticator extension if authentication is required
    endpoint: "another.opentelemetry.collector.example:4317"
  prometheus:
    endpoint: "0.0.0.0:9090"
service:
  pipelines:
    # Forward the OpenTelemetry and Prometheus metrics to another OpenTelemetry Collector endpoint
    metrics/otlp:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [otlp]
    # Expose the OpenTelemetry and Prometheus metrics locally with another Prometheus /metrics endpoint
    metrics/local:
      receivers: [otlp, prometheus/generator-vllm]
      exporters: [prometheus]
```

#### Histogram tuning

Metrics that emit histogram data utilize pre-configured bucket boundaries. The Rime engine prescribes its own defaults for some metrics. These may also be configurable/tunable through the use of `HISTOGRAM_BUCKETS_*` environment variables.

To separate differently-tuned sets of data across individual metrics, a suffix may be attached to the original metric name using `HISTOGRAM_SUFFIX_*` environment variables.

* `rime.engine.initial_latency`
  * Measured in milliseconds.
  * Default bounds:
    * ```
      0.0, 100.0, 150.0, 200.0, 225.0, 250.0, 275.0, 300.0, 325.0, 350.0, 375.0, 400.0, 450.0, 500.0, 1000.0
      ```
  * Set alternative bounds:
    * ```
      HISTOGRAM_BUCKETS_INITIAL_LATENCY_MS='0.0, 90.0, 95.0, 100.0, 105.0, 110.0, 115.0, 120.0, 125.0, 130.0, 135.0, 140.0, 150.0, 175.0, 200.0'
      ```
  * Set alternative metric name suffix:
    * `HISTOGRAM_SUFFIX_INITIAL_LATENCY='tuned_120ms'` produces a metric named `rime.engine.initial_latency.tuned_120ms`

* `rime.engine.generated_audio_duration`
  * Measured in seconds.
  * Default bounds:
    * ```
      0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.5, 10.0, 20.0, 30.0, 60.0
      ```
  * Set alternative bounds:
    * ```
      HISTOGRAM_BUCKETS_GENERATED_AUDIO_DURATION_SECONDS='0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0, 7.5, 10.0, 20.0, 30.0, 60.0'
      ```
  * Set alternative metric name suffix:
    * `HISTOGRAM_SUFFIX_GENERATED_AUDIO_DURATION='tuned_3s'` produces a metric named `rime.engine.generated_audio_duration.tuned_3s`

## Mist container

### Healthcheck

To check if the model is running properly, you can perform a liveness probe using the `/ping` endpoint:

```bash theme={null}
curl -X GET "http://localhost:8080/ping"
```

A typical response looks like this:

```json theme={null}
pong
```

### Metrics

For more detailed operational insights, the model service exposes Prometheus-compatible metrics at the /metrics endpoint:

```bash theme={null}
curl -X GET "http://localhost:8080/metrics"
```

This endpoint provides telemetry data, including:

* **HTTP request counters:** Detailed breakdown of requests by endpoint, status code, and HTTP method
* **Error tracking:** Counts of HTTP errors by type and status code

Example metrics include:

```json theme={null}
# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{endpoint="/invocations",http_status="200",method="POST"} 102018.0
http_requests_total{endpoint="/metrics",http_status="200",method="GET"} 69908.0
http_requests_total{endpoint="/invocations",http_status="500",method="POST"} 38.0

# HELP http_errors_total Total number of HTTP errors
# TYPE http_errors_total counter
http_errors_total{endpoint="/invocations",error_message="cannot access local variable '_var_var_6' where it is not associated with a value",http_status="500",method="POST"} 38.0
```

These metrics can be integrated with Prometheus monitoring systems to create dashboards and alerts for your Rime deployment.
