Reduce metrics costs via Adaptive Metrics
Adaptive Metrics is a cardinality optimization feature that allows you to identify and eliminate unused time series metrics data by means of aggregation. Recommended rules identify what metrics to aggregate based on usage within your cloud environment.
Adaptive Metrics consists of the following services:
- The recommendations service generates recommended rules for aggregation.
- The aggregations service implements those rules.
The recommended rules of Adaptive Metrics are updated daily, and are available for you to review via the Adaptive Metrics plugin or the Adaptive Metrics HTTP API.
Upon reviewing the recommended rules, you can decide what rules to apply (via the plugin or API). Also, you can create your own rules (via the API).
Supported metrics formats
Grafana Cloud accepts metrics data in a variety of formats, and Adaptive Metrics is compatible with the following subset of formats:
Metrics format | Supported? | Notes |
---|---|---|
Prometheus | Yes | Fully supported. |
OpenTelemetry | Yes | Fully supported. |
Influx Line protocol | Yes | Recommendations are limited because metadata is not sent. |
Datadog | No | |
Graphite | No |
Check if you are sending metadata for your metrics
To check whether you are sending metrics metadata, send a request to the HTTP API metadata
endpoint:
curl -u "$METRICS_INSTANCE_ID:$API_KEY" "https://<cluster>.grafana.net/prometheus/api/v1/metadata"
Note
Adaptive Metrics uses Prometheus metrics metadata stored in your Grafana Hosted Metrics instance to make sure that recommendations are safe to apply mathematically.
For example, for a counter-type metric, recommendations by Adaptive Metrics make sure that counter resets are handled correctly during aggregation.
If metrics metadata is not available for a metric, and Adaptive Metrics is unable to infer a metric’s type from its name or usage patterns, a default recommendation will be produced for that metric which supports the most common aggregation functions (sum(…), count(…), avg(…), and sum(rate(…))). If you are using a metrics format other than Prometheus or OpenTelemetry, metrics metadata is not preserved. As a result, recommendations for those metrics may store more data than strictly necessary and will produce lower cost savings.
Aggregation service: requirements on sample age
We can only aggregate raw samples that are relatively recent. Grafana Cloud will reject samples for metrics being aggregated that arrive more than 90s delayed. If the difference between the wall clock time at which a sample arrives at Grafana Cloud and the timestamp on that sample (which indicates when it was collected) is greater than 90 seconds, Grafana Cloud will reject that sample.
If Grafana Cloud rejects samples for this reason, you will see an increase in sample-too-old-for-aggregation
or aggregator-sample-too-old
errors on the Discarded Metrics Samples panel of your billing dashboard.
This sample age requirement only applies to samples that belong to metrics that are being aggregated.
Why this happens
To compute an aggregation, you must wait for all raw samples associated with that metric to arrive. We don’t know how many samples will arrive, nor can we wait indefinitely on those samples, because the longer we wait, the longer the delay in the data being queryable and/or visible in dashboards.
If a sample arrives after our configured waiting time, it does not get taken into account during the computation of the aggregated value. Because our metrics database is immutable once the aggregation has been computed, we cannot update the aggregated value to reflect this late arriving data point.
Troubleshooting
If you encounter issues querying a metric that has been aggregated, see Troubleshoot your aggregated metrics query. For any other questions or feedback, contact your Customer Success Manager or file a support request.