Join us

ContentUpdates and recent posts about Pelagia..
Link
@faun shared a link, 1 week, 6 days ago

Ansible Service Module: Start, Stop, & Manage Services

The Ansibleservicemodulehandles LinuxandWindows without choking on init system quirks. One playbook can start, stop, enable, or restart anything - no matter the OS. Idempotent, so you don’t have to babysit state. Clean and repeatable. Bonus: it’s great for wrangling fleets. Think: coordinating servi..

Link
@faun shared a link, 1 week, 6 days ago

How AWS S3 serves 1 petabyte per second on top of slow HDDs

AWS S3 doesn’t need fancy hardware. It wrings performance out ofcheap HDDs,log-structured merge trees, anderasure coding. The trick? Shard everything. Hit it in parallel. Randomized placementdodges hotspots.Hedged requestsrace the slowest links. And when things get lopsided, S3 rebalances - constant..

How AWS S3 serves 1 petabyte per second on top of slow HDDs
Link
@faun shared a link, 1 week, 6 days ago

Seven Years of Firecracker

AWS is puttingFirecracker microVMsto work in two fresh stacks:AgentCore, the new base layer for AI agents, andAurora DSQL, a serverless, PostgreSQL-compatible database it just rolled out. AgentCore gives each agent session its own microVM. More isolation, less cross-talk - solid for multistep LLM wo..

Seven Years of Firecracker
Link
@faun shared a link, 1 week, 6 days ago

Automated GitHub Self-Hosted Runner Cleanup: Lambda Functions and Auto Scaling Lifecycle Hooks

When an EC2 instance in an Auto Scaling Group shuts down, event-driven plumbing kicks in. Alifecycle hookcatches the scale-in, fires off an SNS notification, and triggers aLambda. That Lambda calls the GitHub API to yank the self-hosted runner before the instance dies. No dangling runners. No manual..

Automated GitHub Self-Hosted Runner Cleanup: Lambda Functions and Auto Scaling Lifecycle Hooks
Link
@faun shared a link, 1 week, 6 days ago

How LogSeam Searches 500 Million Logs per second

LogSeam rips through500M log searches/secand pushes1.5+ TB/s throughputusing Tigris’ geo-distributed object storage. It slashes log volume by 100× with Parquet + Zstandard compression. Then it spins up compute on the fly, right where the data lives—no long-running infrastructure, no laggy reads...

How LogSeam Searches 500 Million Logs per second
Link
@faun shared a link, 1 week, 6 days ago

Internal HTTPS Routing in Istio.

Istio finally bringsinternal HTTPS routingwithSNI-based traffic rules. Services in the mesh can now talk over port 443—TLS fully intact. Just like in prod. TLS terminates at the ingress gateway. Routing pivots on SNI, not headers. Which makes this much closer to real-world mTLS flows. What’s the pla..

Internal HTTPS Routing in Istio.
Link
@faun shared a link, 1 week, 6 days ago

How I Built My Kubernetes Command Toolkit: A Journey from kubectl Chaos to Command Mastery

A dev-built Kubernetes CLI framework reshapeskubectlfor how teams actually work. Commands get grouped by role - dev, SRE, sec, admin - instead of by resource. It bakes in defaults forKyvernopolicies, encourages muscle-memory workflows, and wires up real-time troubleshooting to shrink downtime in pro..

How I Built My Kubernetes Command Toolkit: A Journey from kubectl Chaos to Command Mastery
Link
@faun shared a link, 1 week, 6 days ago

Introducing Headlamp Plugin for Karpenter

The newHeadlamp Karpenter Pluginwires real-time autoscaling insight straight into the Headlamp UI. It showsKarpenterresources, live metrics, scaling moves—no kubectl spelunking required. NodePoolsandNodeClaimsget mapped to core Kubernetes objects. You can tweak configs in the UI, get validation on t..

Introducing Headlamp Plugin for Karpenter
Link
@faun shared a link, 1 week, 6 days ago

Most Cloud-Native Roles are Software Engineers

Software Engineers still own the cloud-native job boards in 2025 - nearly47%of all Kubernetes-tagged listings. DevOps holds onto second. But Platform Engineers just leapfrogged SREs, which have slid 30% since 2023...

Most Cloud-Native Roles are Software Engineers
Link
@faun shared a link, 1 week, 6 days ago

The Myths (and Costs) of Running Node.js on Kubernetes

Kubernetes struggles to scale Node.js efficiently due to a mismatch in resource usage patterns. Autoscaling can be sluggish with bursty traffic, leading to revenue risks and performance issues. Teams must rethink resource allocation and scaling strategies to optimize Node.js efficiency in Kubernetes..

Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.