Feedback

Chat Icon

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

Instrumentation with Prometheus in Practice
68%

Histograms: Tracking the Distribution of Values Over Time

Histograms are used to track the size and number of events in a series of buckets.

ℹ️ We call each range of values that we want to measure a "bucket".

For example, if we want to measure the response time of a request, our buckets will represent different ranges of response times. We can define buckets in milliseconds, such as 0-5 ms, 5-10 ms, 10-25 ms, and so on.

Histograms are commonly useful for measuring the distribution of values over time, such as the response time of a web server or the size of files downloaded from a website. But they can also define the distribution of any other metric that can be measured in ranges. Histograms are also useful for tracking the distribution of values that can change dynamically, such as the number of requests per second or the size of a file.

In order to test this feature, we will create a route that sends a request to the GitHub API to fetch the details of a repository. We will measure the time it takes to fetch the details, and store the response time in a histogram.

To instantiate a histogram in a Python application, we will use the following code:

# Define a histogram to measure the time taken to process requests
histogram = Histogram(
    'request_duration_seconds',
    'Time taken to process a request',
    ['method', 'endpoint']
)
  • request_duration_seconds: the name of the histogram.
  • Time taken to process a request: the HELP description of the histogram.
  • ['method', 'endpoint']: the labels of the histogram. We will use these labels to differentiate between different types of requests.

Then, we will create a route to fetch the details of a repository from GitHub and measure the time it takes to process the request:

# Define the route to get information about a repository
@app.route('/repos//')
def get_repos(username, repository):
    """Get information about a repository on GitHub."""
    repos = username + '/' + repository
    url = 'https://api.github.com/repos' + '/' + repos
    # Measure the time taken to process the request
    with histogram.labels('GET', '/repos').time():
        # Send a GET request to the GitHub API
        response = requests.get(url)
        # Return the response
        return response.json()

This is the code to create the Flask app:

cat <<EOF > /prometheus-python-example/using_histograms.py
from flask import Flask, Response
import requests
from prometheus_client import (
    Histogram, # Import Histogram class
    generate_latest, # Function to generate the latest metrics
    CONTENT_TYPE_LATEST # MIME type for Prometheus metrics
)

# Create a Flask application
app = Flask(__name__)

# Define a histogram to measure the time taken to process requests
histogram = Histogram(
    'request_duration_seconds',
    'Time taken to process a request',
    ['method', 'endpoint']
)

# Define the route to get information about a repository
@app.route('/repos//')
def get_repos(username, repository):
    """Get information about a repository on GitHub."""
    repos = username + '/' + repository
    url = 'https://api.github.com/repos' + '/' + repos
    # Measure the time taken to process the request
    with histogram.labels('GET', '/repos').time():
        # Send a GET request to the GitHub API
        response = requests.get(url)
        # Return the response
        return response.json()

# Define a route to expose metrics
@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

# Run the application
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
EOF

Our function accepts two parameters: username and repository. It constructs the URL to fetch the details of the repository from the GitHub API and sends a GET request to that URL. The time taken to process the request is measured using the histogram request_duration_seconds. We also label the histogram with the method and the endpoint. These labels were defined when we created the histogram:

histogram = Histogram(
    'request_duration_seconds',
    'Time taken to process a request',
    ['method', 'endpoint'] # <--- Labels
)

Run the Flask application:

python /prometheus-python-example/using_histograms.py

Test the /repos endpoint using the following command:

# First test
curl http://$server1:5000/repos/prometheus/prometheus && echo
# Second test
curl http://$server1:5000/repos/apache/kafka && echo
# Third test
curl http://$server1:5000/repos/moby/moby && echo

Check the histogram metrics using the following command:

curl -s http://$server1:5000/metrics | grep request_duration_seconds

The output should list the histogram with the labels method, endpoint, and url:

# HELP request_duration_seconds Time taken to process a request
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{endpoint="/repos",le="0.005",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.01",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.025",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.05",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.075",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.1",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.25",method="GET"} 0.0
request_duration_seconds_bucket{endpoint="/repos",le="0.5",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="0.75",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="1.0",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="2.5",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="5.0",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="7.5",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="10.0",method="GET"} 3.0
request_duration_seconds_bucket{endpoint="/repos",le="+Inf",method="GET"} 3.0
request_duration_seconds_count{endpoint=

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

Enroll now to unlock all content and receive all future updates for free.