@faun shared a link, 1 year ago

@adammetis shared a post, 2 months, 2 weeks ago
What Is Database Monitoring

We can’t let our databases fail. We need to have measures in place to guarantee that the crucial business data is not impacted. One way of doing that is real-time database monitoring which involves continuously observing, analyzing, and managing the performance and health of a database system. Let's see what that is and how to use it.

@vmihailenco shared a post, 10 months, 2 weeks ago

Monitoring CPU/RAM/disk metrics with OpenTelemetry and Uptrace

OpenTeleletry Collector is an open source data collection pipeline that allows you to monitor CPU, RAM, disk, network metrics, and many more.

Collector itself does not include built-in storage or analysis capabilities, but you can export the data to Uptrace and ClickHouse, using them as a replacement for Grafana and Prometheus.

When compared to Prometheus, ClickHouse can offer small on-disk data size and better query performance when analyzing millions of timeseries.

@mohammad_zaigam shared a post, 1 year ago
The unprecedented growth of data in recent years has led to a demand for evolution in traditional monitoring practices.

The current observability maturity model is a good solution but needs further augmentations.

The widely accepted model includes the following stages:

1) Monitoring (Is everything in working order?)

2) Observability (Why is it not working?)

3) Full-Stack Observability (What is the origin of the problem, and what are its consequences?)

4) Intelligent Observability (How to predict anomalies and automate response?)

LOGIQ is supporting the next stage in the model i.e, Federated Observability. In other words, data availability for consumers with on-demand convenience.

@yair_stark shared a post, 2 years, 2 months ago

Error Budget Is All You Need - Part 2

In part 1 I proposed a simple modification to Google’s Multi-Window Multi-Burn Rate alerting setup and I showed how this modification addresses the cases of varying-traffic services and typical latency SLOs.

@squadcast shared a post, 2 years, 4 months ago

What can SREs do to make holiday season’s peak traffic less chaotic?

Holiday season's peak traffic is the most challenging period for SREs and on-call engineers. In this blog, we have highlighted the things that SREs can do to make the holiday season less chaotic.