From Borgmon to Cloud Native Monitoring: Prometheus's Journey

Prometheus is an open-source system used for monitoring and alerting that records real-time metrics in a time series database. It was originally developed at SoundCloud in 2012 and designed to address the limitations of existing monitoring tools like StatsD and Graphite, which were found inadequate for SoundCloud's scaling requirements. Prometheus provides a multi-dimensional data model, operational simplicity, scalable data collection, and a powerful query language, all within a single tool. The project, written in Go and licensed under the Apache 2 License, is hosted on GitHub.

Prometheus is today one of the most popular monitoring tools in the Cloud Native ecosystem, if not the most popular. It is widely used for monitoring Kubernetes, microservices, containerized applications, and other Cloud Native workloads but can also integrate with traditional systems. It operates on an HTTP pull model, with flexible queries and real-time alerting.

ℹ️ The pull model is a method of data collection where the Prometheus server fetches metrics from the target systems, in contrast to the push model, where the target systems push metrics to the monitoring server.

As a graduated project of the Cloud Native Computing Foundation (CNCF), alongside Kubernetes, Envoy, Istio, Jaeger, etcd, CoreDNS, and other popular tools, Prometheus is widely adopted by companies such as Google, DigitalOcean, Red Hat, Slack, Uber, Medium, Trivago, and many others.

ℹ️ The CNCF defines itself as a vendor-neutral foundation that hosts open-source projects in the cloud-native ecosystem. It was founded in 2015 by the Linux Foundation and has since become the home of many popular projects such as Kubernetes, Prometheus, Envoy, and others.

The tool was inspired by Borgmon, a monitoring system used at Google. By 2013, Prometheus was introduced for production monitoring at SoundCloud, and the official public announcement was made in January 2015.

"Prometheus is Borgmon, which is a bit funny as it's used by k8s now. It's also funny as the author left Google, wrote it in 2 years for SoundCloud, and then went back to Google." ~ kyrra, a user on Hacker News

Google's Borgmon, for the curious, is a monitoring system that was developed at Google to monitor Borg, Google's internal cluster management system and Kubernetes' predecessor. The SRE Book, a book written by engineers from the same company, provides a detailed explanation of Borgmon and its role in monitoring Google's infrastructure. Borgmon suffered from a number of issues, mainly inherent in its design and architecture. These limitations led to the development of Monarch, Google's Planet-Scale In-Memory Time Series Database, which replaced Borgmon through a multi-decade migration effort (according to a xoogler). Many of Google's services use Monarch, like Stackdriver, Google's Managed Service for Prometheus, and others.

Prometheus is, in reality, different from Borgmon in many ways. It is more flexible, easier to operate, and has taken the good parts of Borgmon while leaving the bad parts behind.

"Prometheus is an escaped implementation of Google’s Borgmon, which is seen inside Google as a kind of horror show, and alternatives have been developed." ~ the_evacuator, on Hacker News

Cloud-Native Microservices With Kubernetes - 2nd Edition

A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in Kubernetes

Enroll now to unlock all content and receive all future updates for free.

Unlock now $31.99 Learn More

Previous Next