Iter8’s out-of-the-box metrics and how to extend iter8’s metrics.
Iter8 leverages the metrics collected by Istio telemetry and stored in Prometheus. Users relying on iter8’s out-of-the-box metrics can simply reference them in the criteria section of an Experiment
specification, as illustrated in this tutorial and documented in the Experiment
CRD documentation. Iter8’s out-of-the-box metrics are as follows.
Metric name | Description |
---|---|
iter8_request_count | total number of HTTP requests to a service version |
iter8_latency | average time in milli seconds taken by a service version to respond to HTTP requests |
iter8_error_count | number of HTTP requests that resulted in error (5xx HTTP status codes) |
iter8_error_rate | fraction of HTTP requests that resulted in errors, i.e., iter8_error_count / iter8_request_count |
When iter8 is installed, a Kubernetes ConfigMap
named iter8config-metrics
is populated with a definition for each of the above out-of-the-box metrics. You can see the metric definitions in this file. You can extend iter8’s metrics by extending this configmap. Below, we describe the two types of metrics supported by iter8, namely, counter
and ratio
metrics and how to extend the configmap in order to add new counter and ratio metrics.
A counter metric is a metric whose initial value is zero and which can only increase over time. An example of a counter metric that is available out-of-the-box in iter8 is iter8_request_count, which is the total number of HTTP requests that were received by a service version. Iter8 counter metrics have the following fields.
Field | Type | Description | Required |
---|---|---|---|
name | string | Name of the metric | yes |
query_template | string | Prometheus query template used to fetch this metric (see below) | yes |
preferred_direction | higher or lower | This field indicates if higher values of the metric or preferred or lower values are preferred. It is of type enum with two possible values, higher or lower. For example, the iter8_error_count metric has a preferred direction which is lower. Preferred direction needs to be specified if you intend to use this as a reward metric or a metric with thresholds within experiment criteria (see Experiment CRD documentation) |
no |
units | string | Unit of measurement for this metric. For example, iter8_latency is a metric available out-of-the-box in iter8 and is measured in milli seconds. This field is used by iter8’s Kui and Kiali integrations to format display. | no |
description | string | A description of this metric. This field is used by iter8’s Kui and Kiali integrations to format display. | no |
The Prometheus query template for the counter metric iter8_error_count is shown below.
sum(increase(istio_requests_total{response_code=~'5..',reporter='source',job='envoy-stats'}[$interval])) by ($version_labels)
The query template has two placeholders (i.e., terms beginning with $). These placeholders are substituted with actual values by iter8 in order to construct a Prometheus query.
The query template has a group by
clause (specified using the by keyword) with the placeholder $version_labels
as the group key, which ensures that each item in the Prometheus response vector corresponds to a distinct version. Iter8 internally maps service versions to Prometheus entities using this placeholder.
The length of the recent time window over which this metric is computed is captured by the placeholder $interval
.
Both these placeholders are required in the query template. When a template is instantiated (i.e., placeholders are substituted with values), it results in a Prometheus query expression. An example of a query instantiated from the above template is shown below. In this example, since distinct versions correspond to distinct deployments of a service, iter8 has substituted $version_labels
with the prometheus labels destination_workload, destination_workload_namespace
. Each combination of these prometheus labels corresponds to a distinct version.
sum(increase(istio_requests_total{response_code=~'5..',reporter='source',job='envoy-stats'}[300s])) by (destination_workload, destination_workload_namespace)
A ratio metric is a ratio of two counter metrics. An example of a ratio metric that is available out-of-the-box in iter8 is iter8_latency, which is the average time taken by a service version to respond to HTTP requests. Iter8 ratio metrics have the following fields.
Field | Type | Description | Required |
---|---|---|---|
name | string | Name of the metric | yes |
numerator | string | The counter metric in the numerator of the ratio | yes |
denominator | string | The counter metric in the denominator of the ratio | yes |
preferred_direction | higher or lower | This field indicates if higher values of the metric or preferred or lower values are preferred. It is of type enum with two possible values, higher or lower. For example, the iter8_latency metric has a preferred direction which is lower. Preferred direction needs to be specified if you intend to use this as a reward metric or a metric with thresholds within experiment criteria (see Experiment CRD documentation) |
no |
zero_to_one | boolean | This field indicates if the ratio metric always takes value in the range [0, 1]. For example, the iter8_error_rate metric has zero_to_one set to true. This field is optional and false by default. However, setting this field to true for metrics which possess this property helps iter8 provide better assessments. | no |
units | string | Unit of measurement for this metric. For example, iter8_latency has milli seconds as its units. This field is used by iter8’s Kui and Kiali integrations to format display. | no |
description | string | A description of this metric. This field is used by iter8’s Kui and Kiali integrations to format display. | no |
You can add new counter metrics in iter8 by extending the counter_metrics.yaml
section of the configmap and new ratio metrics in iter8 by extending the ratio_metrics.yaml
section of the configmap. For example, in the A/B/n rollout tutorial, during the step where you defined new metrics, you added the three new counter metrics and two new ratio metrics and extended the configmap as shown below.
apiVersion: v1
kind: ConfigMap
metadata:
name: iter8config-metrics
namespace: iter8
data:
# by convention, metrics with names beginning with iter8_ are defined by iter8
# a counter metric is monotonically increasing or decreasing
counter_metrics.yaml: |-
- name: iter8_request_count
query_template: sum(increase(istio_requests_total{reporter='source',job='istio-mesh'}[$interval])) by ($version_labels)
- name: iter8_total_latency
query_template: (sum(increase(istio_request_duration_seconds_sum{reporter='source',job='istio-mesh'}[$interval])) by ($version_labels))*1000
units: msec # optional
- name: iter8_error_count
query_template: sum(increase(istio_requests_total{response_code=~'5..',reporter='source',job='istio-mesh'}[$interval])) by ($version_labels)
preferred_direction: lower
- name: books_purchased_total
query_template: sum(increase(number_of_books_purchased_total{}[$interval])) by ($version_labels)
- name: le_500_ms_latency_request_count
query_template: (sum(increase(istio_request_duration_seconds_bucket{le='0.5',reporter='source',job='istio-mesh'}[$interval])) by ($version_labels))
- name: le_inf_latency_request_count
query_template: (sum(increase(istio_request_duration_seconds_bucket{le='+Inf',reporter='source',job='istio-mesh'}[$interval])) by ($version_labels))
# the value of a ratio metric equals value of numerator divided by denominator
ratio_metrics.yaml: |-
- name: iter8_mean_latency
numerator: iter8_total_latency
denominator: iter8_request_count
preferred_direction: lower
- name: iter8_error_rate
numerator: iter8_error_count
denominator: iter8_request_count
preferred_direction: lower
zero_to_one: true
- name: mean_books_purchased
numerator: books_purchased_total
denominator: iter8_request_count
preferred_direction: higher
- name: le_500_ms_latency_percentile
numerator: le_500_ms_latency_request_count
denominator: le_inf_latency_request_count
preferred_direction: higher
zero_to_one: true
Note Iter8 metrics are built on top of Prometheus metrics and rely on them being populated correctly in the Prometheus instance used in the iter8 experiment. All of iter8’s out-of-the-box metrics rely on Prometheus metrics created by Istio’s telemetry. In the extended configmap example, the newly defined counter metrics le_500_ms_latency_request_count and le_inf_latency_request_count, and the ratio metric le_500_ms_latency_percentile also rely on Prometheus metrics created by Istio’s telemetry. The counter metric books_purchased_total and the ratio metric mean_books_purchased are created by directly instrumenting the underlying application.