How a Production Outage Was Caused Using Kubernetes Pod Priorities

On Friday, July 19, Grafana Cloud experienced a ~30min outage in our Hosted Prometheus service. To our customers who were affected by the incident, I apologize. It’s our job to provide you with the monitoring tools you need, and when they are not available we make your life harder. We take this outage very seriously. This blog post explains what happened, how we responded to it, and what we’re doing to ensure it doesn’t happen again.


Comments

Be the first to comment !



Related Posts


4 months, 2 weeks ago

Storyscript/Storyscript

The polyglot, cloud-native programming language for zero-DevOps deployments into Kubernetes.

..

4 months, 2 weeks ago

Gitlab And Google Webcast - Running Containerized Applications on Modern Serverless Platforms

In this webcast, we'll walk through some of the benefits and challenges of using cloud-vendor-spe..

Jérôme Petazzoni , 4 months, 3 weeks ago

Kubernetes Deployments: The Ultimate Guide

One of the first Kubernetes commands that we learn and use is kubectl run. Folks who have experie..

Sandor Magyari , 5 months, 2 weeks ago

A deep dive into Kubernetes federation v2

One of the key features of our container management platform, Pipeline, and our CNCF certified Ku..

Carlos Arilla , 6 months ago

How to Monitor Golden Signals in Kubernetes.

What are Golden signals metrics? How do you monitor golden signals in Kubernetes applications? Go..

5 months, 2 weeks ago

Stakater/Reloader

A Kubernetes controller to watch changes in ConfigMap and Secrets and then restart pods for Deplo..

-->