Join us

ContentUpdates and recent posts about Slurm..
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.33

Kubernetes 1.33struts onto the scene with stablesidecar containers, topology-aware routing, and pod topology spread constraints. No beta testing anymore; it's fully unleashed onAWS EKS... read more  

Amazon EKS and Amazon EKS Distro now supports Kubernetes version 1.33
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Mastering Kubernetes Migrations From Planning to Execution

Managed K8slike Amazon EKS or GKE? A ticket to smoother ops, but at the expense of control. Enterautoscaling, service meshes, andGitOps—they shift the deployment game dramatically. But don’t fall into the trap of thinking every app belongs on K8s. High-latency, tightly bound apps flounder there. Tos.. read more  

Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

What’s New in Networking for Kubernetes in the Isovalent Platform 1.17

The Isovalent Platform 1.17 release brings major upgrades to Kubernetes networking, including a new standalone Egress Gateway, dynamic BGP features, enhanced multi-tenant security policies, and smoother Calico-to-Cilium migrations. This version also introduces easier observability with integrated Ti.. read more  

Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

AI Runs Best On Cloud Native—Who's Managing the Kubernetes Platform?

AI workloads thrive on cloud-native platforms like Kubernetes because they offer the scalability, portability, and speed needed for modern machine learning—but building and running this infrastructure is highly complex and distracts from core AI work. The post argues that unless your business is inf.. read more  

Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Announcing new Model Context Protocol (MCP) Servers for AWS Serverless and Containers - AWS

AWS's Model Context Protocol (MCP) servers arm AI code assistantsto deftly wrangle AWS Lambda, ECS, and EKS. They launch apps at warp speed.MCP servers cram in AWS best practices and operational secrets,freeing you from infrastructure drudgery. You get to dive straight into crafting the heart of you.. read more  

Announcing new Model Context Protocol (MCP) Servers for AWS Serverless and Containers - AWS
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

The Risk of Default Configuration: How Out-of-the-Box Helm Charts Can Breach Your Cluster

Apache Pinot's Helm setup is a welcome mat for troublemakers.It throws the doors open to critical services without bothering to ask, "Who goes there?" It's the kind of oversight attackers savor.Meshery and Selenium join the party too.Their default settings flirt with disaster, leaving the gates ajar.. read more  

The Risk of Default Configuration: How Out-of-the-Box Helm Charts Can Breach Your Cluster
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

ClusterAPI Provider for AWS and Cilium

Cluster APIis the aspirin for Kubernetes cluster migraines, especially when tangoing with AWS. With neat tricks likeEKS upgradesandself-managed nodes, it’s a godsend.KinDsteps up as the management cluster sidekick in this AWS adventure, while CAPA rolls up its sleeves, threading infrastructure provi.. read more  

ClusterAPI Provider for AWS and Cilium
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Gateway API v1.3.0: Advancements in Request Mirroring, CORS, Gateway Merging, and Retry Budgets

Gateway API v1.3.0lands with a killer feature:percentage-based request mirroringthat makes traffic handling a whole lot savvier. Fancy a peek at the cutting-edge? Dive into theCORS filtersandretry budgets, all shiny and experimental. Just a heads-up: these feature names sport an "X" at the front—mea.. read more  

Gateway API v1.3.0: Advancements in Request Mirroring, CORS, Gateway Merging, and Retry Budgets
Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Building Kubernetes Controllers in Node.js

Kubenodeis the secret weapon forNode.jsdevelopers diving intoKubernetes. Forget about wrestling with Go—this tool empowers you to wield custom resources and automate like a boss... read more  

Link
@faun shared a link, 6 months, 1 week ago
FAUN.dev()

Start Sidecar First: How To Avoid Snags

Kubernetesv1.29.0 steps up its game with sidecars now always booting before the main apps. Fancy that. But don’t get too comfy. To make sure everything’s truly ready, lean on readiness probes or whip up a shell script with a lifecycle hook to get that perfect launch choreography... read more  

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.