Feedback

Chat Icon

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

Monitoring Docker Swarm with Prometheus
78%

Configuring Prometheus

dockerswarm_sd_configs is a Prometheus service discovery configuration that automatically discovers scrape targets from a Docker Swarm cluster. It lets Prometheus dynamically find containers, services and other resources managed by Swarm without manual static target lists.

So, instead of defining static targets like this:

scrape_configs:
  - job_name: 'docker'
    static_configs:
      - targets: [':9323', ':9323']

We define a dynamic configuration that uses the Docker Swarm API to discover the nodes and services in the cluster:

scrape_configs:
  - job_name: 'docker_swarm'
    dockerswarm_sd_configs:
      - host: http://:2376
        role: nodes

Where host is the address of the Docker daemon on the manager node, and role specifies what kind of resources to discover (nodes, tasks, services, etc.).

Let's implement this.

On monitoring, export the IP address of server1:

export server1=

Update the configuration file to scrape the metrics from the daemon on the manager node and discover the nodes in the cluster:

cat << EOF > /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker_swarm'
    dockerswarm_sd_configs:
      - host: http://$server1:2376
        role: nodes

    relabel_configs:

      # First action: Use the node address as the target address
      # and append the port 9323 where the metrics are exposed
      # => This will tell Prometheus to scrape the metrics from each node
      # on the port 9323
      - action: replace
        source_labels: [__meta_dockerswarm_node_address]
        target_label: __address__
        replacement: \${1}:9323

      # Second action: Use the node hostname as the instance label
      - action: replace
        source_labels: [__meta_dockerswarm_node_hostname]
        target_label: instance
EOF

This dockerswarm_sd_configs configuration in Prometheus is used to specify the Docker Swarm service discovery settings.
This section is important to enable Prometheus to automatically discover and scrape metrics from the nodes within the cluster.
We also used http://$server1:2376 in host to specify the address of the Docker daemon on the manager node and nodes in role to discover all nodes in the cluster.
The tcp://$server1:2376 endpoint is where the Docker daemon is listening for remote connections, we already configured this in a previous step on server1:

"hosts": ["unix:///var/run/docker.sock", "tcp://$server1:2376"]

The relabel_configs section is used to apply some transformations before scraping.

The first one sets the __address__ which is the address of the target to scrape to the address of the discovered node with the port 9323 appended to it:

- action: replace
  source_labels: [__meta_dockerswarm_node_address]
  target_label: __address__
  replacement: $1:9323

The second one sets the instance label to the hostname of the discovered node:

- action: replace
  source_labels: [__meta_dockerswarm_node_hostname]
  target_label: instance

We can now move to the second node (server2) and enable the metrics there as well.

On server2, export the IP address of server2 (private IP in our case):

export server2=

Set the DOCKER_METRICS_ADDR variable used to enable the metrics on server2:

export DOCKER_METRICS_ADDR=$server2:9323

Update the daemon configuration file to tell Docker to expose the metrics on the specified address:

cat < /etc/docker/daemon.json
{
  "metrics-addr" : "${DOCKER_METRICS_ADDR}",
  "experimental" : true
}
EOF

Reload the Docker daemon:

systemctl restart docker

On monitoring, check if everything is working:

# Reload Prometheus configuration
kill -HUP $(pgrep prometheus)

# Service discovery info
export server1=
curl -s http://$server1:2376/nodes | jq

# Metrics from server1
export server1=
curl http://${server1}:9323/metrics

# Metrics from server2
export server2=
curl http://${server2}:9323/metrics

If you go to the list of targets on the Prometheus web interface, you should see both cluster nodes listed as targets.

The endpoint column shows the address of each node with the port 9323 appended to it, which is where the metrics are exposed.

Prometheus Targets

Prometheus Targets

At this point, we used role: nodes in the dockerswarm_sd_configs configuration which means that Prometheus is using :2376/nodes. This is only discovering the nodes in the cluster.

We can also use /tasks to discover the tasks (containers), and with the help of cAdvisor, you can get more detailed metrics about them. To do this, we need to create a service that runs cAdvisor on each node.

In the following command, we are running a global cAdvisor service in the cluster. You should run the command on the manager (server1) node:

docker service create \
  --name cadvisor \
  -l prometheus-job=cadvisor \
  --mode=global \
  --publish target=8080,mode=host \
  --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro \
  --mount type=bind,src=/,dst=/rootfs,ro \
  --mount type=bind,src=/var/run,dst=/var/run \
  --mount type=bind,src=/sys,dst=/sys,ro \
  --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro \
  gcr.io/cadvisor/cadvisor:v0.52.0

ℹ️ The --mode=global flag is used to ensure that the service runs exactly one task on each node in the cluster.

> ℹ️ The `-docker_only` flag is used to run cAdvisor in Docker-only mode. This mode is used to disable the collection of machine-level metrics and focus only on container-level metrics.

After creating the service, you can check the metrics endpoint:

curl http://$server1:8080/metrics

You should see a list of metrics related to the containers running on the node.

We can now update the Prometheus configuration to discover the tasks (containers) in the cluster using the tasks role. This is how we can do it:

- job_name: 'docker_tasks'
  dockerswarm_sd_configs:
    - host: http://$server1:2376
      role: tasks

  relabel_configs:

    # Only keep containers that should be running.
    - action: keep
      source_labels: [__meta_dockerswarm_task_desired_state]
      regex: running

    # Only keep containers that have a prometheus-job label.
    - action:

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

Enroll now to unlock all content and receive all future updates for free.