Join us

ContentUpdates and recent posts about Pelagia..
Link
@kaptain shared a link, 4 months ago
FAUN.dev()

Streamline your containerized CI/CD with GitLab Runners and Amazon EKS Auto Mode

GitLab Runners now work withAmazon EKS Auto Mode. That means hands-off infra, smarter scaling, and built-in AWS security. Runners spin up onEC2 Spot Instances, so teams can cut CI/CD compute costs by as much as90%- without hacking together flaky pipelines... read more  

Streamline your containerized CI/CD with GitLab Runners and Amazon EKS Auto Mode
Link
@kaptain shared a link, 4 months ago
FAUN.dev()

Kubernetes GPU Management Just Got a Major Upgrade

Kubernetes 1.34 droppedDynamic Resource Allocation (DRA)- think persistent volumes, but for GPUs and custom hardware. Vendors can now plug in drivers and schedulers for their devices, and workloads can pick exactly what they need. Coming in 1.35: a newworkload abstractionthat speaks the language of .. read more  

Link
@kaptain shared a link, 4 months ago
FAUN.dev()

From Deterministic to Agentic: Creating Durable AI Workflows with Dapr

Dapr droppedDurable Agents- a mashup of classic workflows and LLM-driven agents that can actually get things done and survive rough edges. They track reasoning steps, tool calls, and chat states like a champ. If things crash, no problem: Dapr Workflows and Diagrid Catalyst bring it all back... read more  

From Deterministic to Agentic: Creating Durable AI Workflows with Dapr
Link
@kaptain shared a link, 4 months ago
FAUN.dev()

Implementing assurance pipeline for Amazon EKS Platform

AWS released a full-stack CI/CD validation pipeline forAmazon EKS. It pulls in six layers of testing,Terraform,Helm,Locustload testing, and evenAWS Fault Injectionfor pushing resilience to the edge. The goal: bake policy checks, functional tests, and brutal load tests right into pre-deployment. Fewe.. read more  

Link
@kaptain shared a link, 4 months ago
FAUN.dev()

v1.35: New level of efficiency with in-place Pod restart

Kubernetes 1.35, as you may know, introducedin-place Pod restarts(alpha). It's a real reset: all containers, init and sidecars included - without killing the Pod or kicking off a reschedule. Think restart without the cloud drama. Big win for workloads with heavy inter-container dependencies or massi.. read more  

Link
@kaptain shared a link, 4 months ago
FAUN.dev()

v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager

Kubernetes v1.35 sneaks in an alphafeature gatethat flips the CCM route controller from "check every X minutes" to "watch and react." It now usesinformersto trigger syncs when nodes change - plus a light periodic check every 12–24 hours... read more  

Link
@kaptain shared a link, 4 months ago
FAUN.dev()

1.35: Enhanced Debugging with Versioned z-pages APIs

Kubernetes 1.35 makes a quiet-but-crucial upgrade: z-pages debugging endpoints now returnstructured, machine-readable JSON. That means tools- not just tired humans - can parse control plane state directly. The responses areversioned, backward-compatible, and tucked behind feature flags for now... read more  

Link
@kala shared a link, 4 months ago
FAUN.dev()

The 2026 Data Engineering Roadmap: Building Data Systems for the Agentic AI Era

Data engineering’s getting flipped.AI agentsandLLMsaren’t just tagging along anymore - they’re the main users now. That means engineers need to buildcontext-aware, machine-readable data systemsthat don’t just store info but actually make sense of it. Think:vector databases,knowledge graphs,semantic .. read more  

The 2026 Data Engineering Roadmap: Building Data Systems for the Agentic AI Era
Link
@kala shared a link, 4 months ago
FAUN.dev()

2025: The year in LLMs

2025 was the year LLMs stopped just answering questions and started building things.Reasoning modelslike OpenAI’s o-series and Claude Code took over tool-driven workflows. Asynchronous coding agentsbroke out. These models didn’t just write code - they ran it, debugged it, then did it again. That loo.. read more  

2025: The year in LLMs
Link
@kala shared a link, 4 months ago
FAUN.dev()

Streamlining Security Investigations with Agents

Slack broke down how it's threading AI into its product without torching user trust.Slack AIleans hard ontenant-specific data isolationandzero data retention- no leftover crumbs from LLM interactions. Instead of piping user data through someone else’s APIs, Slack runs LLMs onits own infrawhere it ca.. read more  

Streamlining Security Investigations with Agents
Pelagia is a Kubernetes controller that provides all-in-one management for Ceph clusters installed by Rook. It delivers two main features:

Aggregates all Rook Custom Resources (CRs) into a single CephDeployment resource, simplifying the management of Ceph clusters.
Provides automated lifecycle management (LCM) of Rook Ceph OSD nodes for bare-metal clusters. Automated LCM is managed by the special CephOsdRemoveTask resource.

It is designed to simplify the management of Ceph clusters in Kubernetes installed by Rook.

Being solid Rook users, we had dozens of Rook CRs to manage. Thus, one day we decided to create a single resource that would aggregate all Rook CRs and deliver a smoother LCM experience. This is how Pelagia was born.

It supports almost all Rook CRs API, including CephCluster, CephBlockPool, CephFilesystem, CephObjectStore, and others, aggregating them into a single specification. We continuously work on improving Pelagia's API, adding new features, and enhancing existing ones.

Pelagia collects Ceph cluster state and all Rook CRs statuses into single CephDeploymentHealth CR. This resource highlights of Ceph cluster and Rook APIs issues, if any.

Another important thing we implemented in Pelagia is the automated lifecycle management of Rook Ceph OSD nodes for bare-metal clusters. This feature is delivered by the CephOsdRemoveTask resource, which automates the process of removing OSD disks and nodes from the cluster. We are using this feature in our everyday day-2 operations routine.