Prometheus: Limitations, Trade-offs, and Solutions
High Cardinality
It's important, first of all, to understand what cardinality means. Let's look at some questions and answers to clarify this concept.
Question: How many time series do we have in the following example?
http_requests_total{method="GET", status="200"} 14
http_requests_total{method="GET", status="200"} 10
http_requests_total{method="GET", status="200"} 15
http_requests_total{method="GET", status="200"} 18
http_requests_total{method="GET", status="200"} 13
Answer: If you said 5, you are confusing time series with data points, also known as samples. The number of time series is 1. The labels method and status are the same for all the samples; therefore, they represent the same time series. The number of samples is, indeed, 5.
Question: How many time series do we have in the following example?
http_requests_total{method="GET", status="200"} 140
http_requests_total{method="GET", status="200"} 10
http_requests_total{method="GET", status="203"} 75
http_requests_total{method="POST", status="404"} 50
http_requests_total{method="POST", status="500"} 25
http_requests_total{method="PUT", status="203"} 10
http_requests_total{method="DELETE", status="404"} 5
http_requests_total{method="DELETE", status="404"} 50
Answer: The example above represents different time series: 6 to be exact. The number of samples is 8, representing 6 time series.
- First time series:
http_requests_total{method="GET", status="200"}contains 2 samples. - Second time series:
http_requests_total{method="GET", status="203"}contains 1 sample. - Third time series:
http_requests_total{method="POST", status="404"}contains 1 sample. - Fourth time series:
http_requests_total{method="POST", status="500"}contains 1 sample. - Fifth time series:
http_requests_total{method="PUT", status="203"}contains 1 sample. - Sixth time series:
http_requests_total{method="DELETE", status="404"}contains 2 samples.
The goal of these examples is to illustrate how labels can increase the number of unique time series for the same metric. In real-world scenarios, there would be many metrics with more labels and values.
Prometheus works by collecting and storing time series data, which are essentially sequences of data points indexed by time. Each time series is uniquely identified by its name and a set of key-value pairs, also called label pairs. These pairs are important as they allow for the categorization and differentiation of metrics and enable users to query and analyze specific subsets of data effectively.
For example, a metric called query_duration_seconds might have a label like method to differentiate types of queries (e.g., GET, POST). The following examples illustrate how labels can affect cardinality:
We can start with a simple metric having only one label with two label-value pairs - thus, low cardinality:
###################
# Low cardinality #
###################
query_duration_seconds{method="GET"} 0.5
query_duration_seconds{method="POST"} 0.8
query_duration_seconds{method="POST"} 0.2
query_duration_seconds{method="POST"} 0.1
We can add more labels to the same metric to provide more context and granularity to the data.
######################
# Higher cardinality #
######################
query_duration_seconds{method="GET", status_code="200"} 0.5
query_duration_seconds{method="GET", status_code="404"} 0.8
query_duration_seconds{method="POST", status_codeObservability with Prometheus and Grafana
A Complete Hands-On Guide to Operational Clarity in Cloud-Native SystemsEnroll now to unlock all content and receive all future updates for free.
