BuildQuality of Service

Quality of Service (QoS) Integration Guide

SDK source (GitHub): https://github.com/tangle-network/blueprint/tree/main/crates/qos

This guide explains how to integrate the Blueprint SDK Quality of Service (QoS) system for observability, monitoring, and dashboards. QoS combines heartbeats, metrics, logs, and Grafana dashboards into a single service that you can run alongside any Blueprint.

QoS Summary

The Blueprint QoS system provides a complete observability stack:

  • Heartbeat Service: submits periodic liveness signals to the status registry
  • Metrics Collection: exports system and job metrics via a Prometheus-compatible HTTP endpoint
  • Logging: streams logs to Loki (optional)
  • Dashboards: builds Grafana dashboards (optional)
  • Server Management: can run Grafana and Loki (and optionally Prometheus) for you

What QoS Exposes

QoS always exposes a Prometheus-compatible metrics endpoint when metrics are enabled. Grafana and Loki are optional and can be managed by QoS or connected externally.

ComponentDefault EndpointNotes
Metrics endpointhttp://<host>:9090/metricsPrometheus scrape format. Also exposes basic health and query routes.
Grafana UIhttp://<host>:3000Only when configured or managed by QoS.
Loki push APIhttp://<host>:3100/loki/api/v1/pushOnly when configured or managed by QoS.

Notes:

  • The metrics endpoint is started when QoSConfig.metrics is enabled. If you want QoS without a metrics listener, set metrics: None.
  • The query routes are provided for convenience and debugging. They are not a replacement for a full Prometheus deployment.

Integrating QoS with BlueprintRunner

If you use BlueprintRunner, it wires the HTTP RPC endpoint, keystore URI, and status registry address into QoS for you:

let qos_config = blueprint_qos::default_qos_config();
let heartbeat_consumer = Arc::new(MyHeartbeatConsumer::new());
 
BlueprintRunner::builder(TangleEvmConfig::default(), env)
    .router(router)
    .qos_service(qos_config, Some(heartbeat_consumer))
    .run()
    .await?;

Important behavior:

  • BlueprintRunner::qos_service forces manage_servers(true) today. If you want to avoid managed containers, pass a config with grafana_server: None and loki_server: None. If you also do not want QoS to push logs, set loki: None.

Reference:

HeartbeatConsumer and Keystore Requirements

Heartbeats require a keystore with an ECDSA key. Use BLUEPRINT_KEYSTORE_URI or --keystore-path so QoS can sign heartbeats.

cargo tangle key --algo ecdsa --keystore ./keystore --name operator
export BLUEPRINT_KEYSTORE_URI="$(pwd)/keystore"

Implement the heartbeat consumer using the current trait signature:

use blueprint_qos::heartbeat::{HeartbeatConsumer, HeartbeatStatus};
use blueprint_qos::error::Result as QoSResult;
use std::future::Future;
use std::pin::Pin;
 
#[derive(Clone)]
struct MyHeartbeatConsumer;
 
impl HeartbeatConsumer for MyHeartbeatConsumer {
    fn send_heartbeat(
        &self,
        _status: &HeartbeatStatus,
    ) -> Pin<Box<dyn Future<Output = QoSResult<()>> + Send>> {
        Box::pin(async move { Ok(()) })
    }
}

Configuration Options

Default Configuration

let qos_config = blueprint_qos::default_qos_config();

This enables heartbeat, metrics, Loki logging, and Grafana integration. By default it also includes server configs for Grafana and Loki. If you run it via BlueprintRunner::qos_service, those servers will be started as managed containers unless you disable them in the config.

Reference:

Bring Your Own Observability Stack

If you use QoSServiceBuilder directly, you can keep manage_servers(false) and point QoS to your existing stack.

If you use BlueprintRunner::qos_service, manage_servers(true) is forced. In that case, the way to avoid managed containers is to set the server configs to None while still pointing GrafanaConfig and LokiConfig at your external services.

let qos_config = QoSConfig {
    metrics: Some(MetricsConfig {
        prometheus_server: Some(PrometheusServerConfig {
            host: "127.0.0.1".into(),
            port: 9090,
            use_docker: false,
            ..Default::default()
        }),
        ..Default::default()
    }),
    grafana: Some(GrafanaConfig {
        url: "http://grafana.internal:3000".into(),
        api_key: Some(std::env::var("GRAFANA_API_KEY")?),
        prometheus_datasource_url: Some("http://prometheus.internal:9090".into()),
        ..Default::default()
    }),
    loki: Some(LokiConfig {
        url: "http://loki.internal:3100/loki/api/v1/push".into(),
        ..Default::default()
    }),
    grafana_server: None,
    loki_server: None,
    ..blueprint_qos::default_qos_config()
};

Managed Observability Stack

QoS can spin up Grafana, Loki, and Prometheus containers for you. Make sure Docker is available.

let qos_config = QoSConfig {
    manage_servers: true,
    grafana_server: Some(GrafanaServerConfig {
        admin_user: "admin".into(),
        admin_password: "change-me".into(),
        allow_anonymous: false,
        data_dir: "/var/lib/grafana".into(),
        ..Default::default()
    }),
    loki_server: Some(LokiServerConfig {
        data_dir: "/var/lib/loki".into(),
        config_path: Some("./loki-config.yaml".into()),
        ..Default::default()
    }),
    prometheus_server: Some(PrometheusServerConfig {
        host: "0.0.0.0".into(),
        port: 9090,
        use_docker: true,
        config_path: Some("./prometheus.yml".into()),
        data_path: Some("./prometheus-data".into()),
        ..Default::default()
    }),
    docker_network: Some("blueprint-observability".into()),
    docker_bind_ip: Some("0.0.0.0".into()),
    ..blueprint_qos::default_qos_config()
};

Builder Pattern

Use the builder when you want explicit wiring for heartbeats or custom datasources:

let qos_service = QoSServiceBuilder::new()
    .with_heartbeat_config(HeartbeatConfig {
        service_id,
        blueprint_id,
        interval_secs: 60,
        jitter_percent: 10,
        max_missed_heartbeats: 3,
        status_registry_address,
    })
    .with_heartbeat_consumer(Arc::new(consumer))
    .with_http_rpc_endpoint(env.http_rpc_endpoint.to_string())
    .with_keystore_uri(env.keystore_uri.clone())
    .with_status_registry_address(status_registry_address)
    .with_metrics_config(MetricsConfig::default())
    .with_grafana_config(GrafanaConfig::default())
    .with_loki_config(LokiConfig::default())
    .with_prometheus_server_config(PrometheusServerConfig::default())
    .manage_servers(true)
    .build()
    .await?;

Recording Metrics and Events

Track job execution and errors in your handlers:

if let Some(qos) = &ctx.qos_service {
    qos.record_job_execution(
        JOB_ID,
        start_time.elapsed().as_secs_f64(),
        ctx.service_id,
        ctx.blueprint_id,
    );
}
if let Some(qos) = &ctx.qos_service {
    qos.record_job_error(JOB_ID, "complex_operation_failure");
}

Creating Grafana Dashboards

let mut qos_service = qos_service;
qos_service.create_dashboard("My Blueprint").await?;

The default dashboard template lives at crates/qos/config/grafana_dashboard.json in the SDK.

Accessing Metrics in Code

You can query the metrics provider directly (for custom metrics or status checks):

use blueprint_qos::metrics::types::MetricsProvider;
 
if let Some(qos) = &ctx.qos_service {
    if let Some(provider) = qos.provider() {
        let system_metrics = provider.get_system_metrics().await;
        let _cpu = system_metrics.cpu_usage;
        provider
            .add_custom_metric("custom.label".into(), "value".into())
            .await;
    }
}

Best Practices

✅ DO:

  • Initialize QoS early in your Blueprint startup sequence.
  • Use BlueprintRunner::qos_service(...) to auto-wire RPC + keystore + status registry.
  • Bind the metrics endpoint to 127.0.0.1 unless you explicitly want it scraped remotely.
  • Replace default Grafana credentials when using managed servers.

❌ DON’T:

  • Don’t enable heartbeats without setting BLUEPRINT_KEYSTORE_URI.
  • Don’t expose managed Grafana publicly without auth.
  • Don’t ignore QoS startup errors; they usually indicate misconfigured ports or credentials.

QoS Components Reference

ComponentPrimary StructConfigPurpose
Unified ServiceQoSServiceQoSConfigMain entry point for QoS integration
HeartbeatHeartbeatServiceHeartbeatConfigLiveness signals to the status registry
MetricsMetricsServiceMetricsConfigSystem + job metrics and Prometheus export
LoggingN/ALokiConfigLog aggregation via Loki
DashboardsGrafanaClientGrafanaConfigDashboards and datasources
Server ManagementServerManagerServer configsManages Docker containers for the stack