Microservices Observability in a Kubernetes World: Distributed Tracing
Using Jaeger and OpenTelemetry for Distributed Tracing
The third pillar we’ll explore is distributed tracing. Metrics tell you how healthy your system is. Logs help you dig into issues. But sometimes, that’s not enough.
Picture an Uber-like platform built on microservices—one for user authentication, another for the ride-matching logic, another for payments. Customers start complaining that checkout sometimes takes five seconds. They cancel rides as a result, but this appears to happen on certain days.
Your metrics show nothing strange. CPU, memory, and request counts look normal.
Your logs are clean—no errors, no timeouts.
Then you turn on distributed tracing and follow a single order from start to finish. The trace looks like a subway map—each stop is a microservice. You notice something odd: the payment service calls the fraud-check API, and that call alone takes 3.8 seconds. Drilling deeper, you discover it’s waiting for a third-party verification system that occasionally throttles requests.
Without tracing, you’d still be staring at perfect dashboards and spotless logs. Tracing shows the invisible slow path—the hidden bottleneck buried inside a chain of “healthy” services. It’s the moment when all your data suddenly connects, and the system finally tells its story. It’s not more important than metrics or logs, but it fills some gaps they’re simply blind to.
APMs (application performance monitoring), if you are familiar with them, are usually built on top of distributed tracing. They provide additional features like error tracking, performance monitoring, and user experience analytics. But the core of APMs is distributed tracing, and today, more and more APMs are built on top of OpenTelemetry—an open standard for distributed tracing.
In this section, we will set up a basic distributed tracing system based on OpenTelemetry. This system will collect traces from our applications, but it needs a backend to store and visualize them. One of the most recurring choices in the open-source community is Jaeger.
Jaeger and OpenTelemetry, as you guessed, are complementary tools like Promtail and Loki. They work together to provide a full experience for developers and operators.
Let's summarize their roles:
OpenTelemetry is an observability framework for cloud-native software. It is a merger of OpenCensus and OpenTracing, providing APIs, libraries, agents, and collector services to capture distributed traces, logs, and metrics from your apps. OpenTelemetry offers a vendor-neutral way to instrument applications, collect telemetry data, and send it to different monitoring systems. It supports capturing metrics, logs, and distributed traces, but we are focusing on distributed tracing in this section.
Jaeger is a distributed tracing system inspired by Dapper and OpenZipkin. It is used for monitoring and troubleshooting microservices-based distributed systems and is a CNCF project. It is one of the popular backends that can receive and store distributed traces generated by applications instrumented with OpenTelemetry.
In our setup, OpenTelemetry will be responsible for instrumenting and collecting traces, while Jaeger will be responsible for the storage side of things. For the visualization part, Jaeger can also be helpful, but it can also export traces to other specialized tools like Grafana.
End of the theory; let's get our hands dirty and start with the installation.
According to the official documentation, starting from version 1.31, the Jaeger Operator uses webhooks to validate Jaeger custom resources (CRs). This requires an installed version of cert-manager:
kubectl apply -f \
https://github.com/cert-manager/cert-manager/releases/download/v1.16.1/cert-manager.yaml
Create a namespace called "observability" to host Jaeger and OpenTelemetry components:
kubectl create namespace observability
Install the Jaeger Operator (wait for a few moments until cert-manager is fully operational):
kubectl -n observability apply -f \
https://github.com/jaegertracing/jaeger-operator/releases/download/v1.65.0/jaeger-operator.yaml
Prepare the simplest Jaeger instance—this is a basic Jaeger deployment suitable for testing and development purposes (wait a few moments until the Jaeger Operator is fully operational):
kubectl apply -f - <
---
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
EOF
This is the simplest Jaeger instance that you can deploy. It creates a Jaeger instance with the default configuration. More information about the configuration options is available in the official documentation.
There are many configuration options available to customize your Jaeger instance. For example, you can enable authentication, configure the storage backend (like Elasticsearch or Cassandra), and more. If you want to deploy a more advanced Jaeger instance with Elasticsearch as the storage backend, you can use Helm:
# Examples of using Helm to install Jaeger with different storage backends
helm repo add jaegertracing \
https://jaegertracing.github.io/helm-charts
helm repo update
# To use Elasticsearch as the storage backend
helm install jaeger jaegertracing/jaeger \
--set provisionDataStore.cassandra=false \
--set provisionDataStore.elasticsearch=true \
--set storage.type=elasticsearch
# To use Cassandra as the storage backend
helm install jaeger jaegertracing/jaeger \
--set provisionDataStore.elasticsearch=false \
--set provisionDataStore.cassandra=true \
--set storage.type=cassandra
Cassandra and Elasticsearch are deployed as part of the Helm chart installation. Alternatively, you can use an existing Cassandra or Elasticsearch instance by providing connection details in the configuration. For more information on datastore configuration options, please refer to the official documentation.
If you choose to use the simplest Jaeger instance, you can forward the Jaeger UI port to your local machine.
kubectl -n observability \
port-forward svc/simplest-query 16686:16686 > /dev/null &2>1
Let's move on to OpenTelemetry. Start by installing the OpenTelemetry Operator:
kubectl apply -f \
https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.137.0/opentelemetry-operator.yaml
Create the OpenTelemetry Collector instance (wait a few moments until the OpenTelemetry Operator is fully operational):
kubectl apply -f - <<'EOF'
---
# The OpenTelemetryCollector CRD tells the Operator to deploy a collector instance.
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
# Metadata section — defines the name and namespace of this collector instance.
metadata:
name: otel-collector
namespace: observability
# The spec field contains the configuration for the collector.
spec:
config:
receivers:
otlp:
protocols:
# gRPC protocol endpoint for binary OTLP data (used by most SDKs and agents)
grpc:
endpoint: 0.0.0.0:4317
# HTTP protocol endpoint for OTLP data over HTTP
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
# How often to check memory usage.
check_interval: 1s
# Max memory threshold (75% of container memory).
limit_percentage: 75
# Additional buffer for sudden spikes.
spike_limit_percentage: 15
batch:
# Maximum number of spans per batch.
send_batch_size: 10000
# Maximum time to wait before sending a batch.
timeout: 10s
exporters:
debug: {}
otlp:
# Jaeger collector OTLP gRPC endpoint.
endpoint: "simplest-collector.observability.svc.cluster.local:4317"
tls:
# Disable TLS (useful for local setups).
insecure: true
service:
pipelines:
traces:
# Data enters through the OTLP receiver.
receivers: [otlp]
# Data passes through these processors.
processors: [memory_limiter, batch]
# Finally, data is exported to Jaeger via OTLP.
exporters: [otlp]
EOF
There are a number of sections in the configuration above:
apiVersionandkinddefine the type of Kubernetes resource we are creating. In this case, we are creating an OpenTelemetry Collector instance.metadatacontains metadata about the resource, such as its name and namespace.spec.configcontains the configuration for the OpenTelemetry Collector.receivers,processors,exporters, andserviceare the main components of the OpenTelemetry Collector configuration.
receivers: Where the telemetry data is received. Receivers are responsible for receiving telemetry data from various sources. Here is what we defined in our example:
otlp: This receiver can receive telemetry data via the gRPC and HTTP protocols. It listens on ports4317(gRPC) and4318(HTTP).
processors: How the collected telemetry data is processed before being exported. In our example, we have two processors:
memory_limiter: This processor prevents the collector from running out of memory by periodically checking memory usage and dropping data if usage exceeds safe limits.batch: This processor groups data into batches before sending it to the exporters. This improves efficiency and reduces the number of network calls.
exporters: Where the processed telemetry data is sent. In our example, we have two exporters:
debug: This exporter simply prints telemetry data to the collector’s logs. It is useful for debugging purposes. We can remove it in production.otlp: This exporter sends trace data to a backend that supports the OTLP protocol. In our example, it sends data to the Jaeger collector via OTLP in the observability namespace.
service: We group the receivers, processors, and exporters into pipelines:
pipelines: This section defines the processing pipelines for the collected telemetry data. In our example, we have a single pipeline namedtracesthat receives data from theotlpreceiver, applies thememory_limiterandbatchprocessors, and exports the data to Jaeger via theotlpexporter.
At this stage, we have a fully functional OpenTelemetry Collector that can receive trace data via OTLP, process it, and export it to Jaeger. The next step is to use the SDKs provided by OpenTelemetry to instrument our applications and send trace data to the collector. However, this could be overengineering for simple applications. Fortunately, OpenTelemetry provides a way to auto-instrument applications without client-side code changes.
Auto-Instrumenting Applications with OpenTelemetry
One of the most interesting features of OpenTelemetry is its ability to auto-instrument your application. In this section, we will demonstrate this feature by deploying a sample Spring application that is auto-instrumented using OpenTelemetry. The traces will be automatically generated and sent to the OpenTelemetry Collector we deployed earlier.
To do this, we need to create an Instrumentation resource that defines how the auto-instrumentation should be done. This is the example we will use:
kubectl apply -f - <
# Namespace
apiVersion: v1
kind: Namespace
metadata:
name: petclinic
---
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata: Cloud-Native Microservices With Kubernetes - 2nd Edition
A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in KubernetesEnroll now to unlock all content and receive all future updates for free.
