Join us

ContentUpdates and recent posts about Slurm..
Link
@faun shared a link, 7 months ago
FAUN.dev()

v1.33: In-Place Pod Resize Graduated to Beta

Kubernetes v1.33hits the scene within-place Pod resize. Now, tweak CPU and memory settings without hitting restart. Perfect for keeping stateful apps sturdy. Expect faster scaling and smarter resource juggling. Plus, fancy new subresources and conditions polish up management and error reporting. In .. read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

v1.33: Updates to Container Lifecycle

Kubernetesv1.33just got a little smarter. Now you can use azero-duration Sleepaction in container lifecycle hooks. That means no more juggling extra binaries—nice and tidy. With alpha support, you get to tweak stop signals within containers. Forget those pesky image-level defaults. The catch? Your c.. read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

Announcing Native Azure Functions Support in Azure Container Apps

UnleashAzure FunctionsonAzure Container Appswith the fresh deployment model. Tap into the complete ACA toolkit—auto-scaling magic, no more juggling infrastructure. Transition turbocharges performance, smooths out deployment snags via CLI or Portal. Just set up with"kind=functionapp"and watch simplic.. read more  

Announcing Native Azure Functions Support in Azure Container Apps
Link
@faun shared a link, 7 months ago
FAUN.dev()

AI at Scale: Serverless or Kubernetes?

At Kingfisher, GCP Vertex AI Pipelines and Kubernetes dance together, tackling AI scaling issues with grace.Serverless sounds dreamy until your budget cries uncle under traffic spikes. Kubernetes, though, delivers predictability, a perfect match for Kingfisher's consistent AI tasks... read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

Why Even Stateless AKS Clusters Might Need Backup

Backing up those “stateless”AKS clustersisn’t just nerdy paranoia. Config drift, compliance headaches, and meddling hands make it a real necessity. In the DevOps trenches, clusters often wander off script from Git. Here, automated AKS backups ride in like heroes—capturing real-time snapshots, stream.. read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

1.33: Job's SuccessPolicy Goes GA

Kubernetes v1.33 just unleashedJob success policy GA. Now you can set your own victory conditions for Jobs, which will make life a whole lot easier for AI/ML andHPC workloads... read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

v1.33: Image Pull Policy the way you always thought it worked!

Kubernetes v1.33finally crushesIssue 18787. Now, every pod must authenticate before playing with already pulled private images. Security toughens without missing a beat. A fresh credential verification system zaps a decade-old loophole, slamming the door on unauthorized access... read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

kuberc is Here! Customizing kubectl with Kubernetes 1.33

Kuberc, introduced inKubernetes 1.33as an alpha feature, allows users to personalize their kubectl command-line experience with aliases and default flags. This configuration file separates personal preferences from the kubeconfig file, simplifying complex commands and reducing errors. Teams can pote.. read more  

Link
@faun shared a link, 7 months ago
FAUN.dev()

How Kubernetes is Built

Kubernetessprang from Google'sBorglike a tech prodigy. It's a lesson in open-source wizardry, orchestrated by 150-200 zealous maintainers who roll out fresh updates every 14-16 weeks like clockwork. But here’s the magic trick: the "lead" and "shadow" setup. It’s a clever mentorship dance that lets r.. read more  

How Kubernetes is Built
Link
@faun shared a link, 7 months ago
FAUN.dev()

OrbStack: A Deep Dive for Container and Kubernetes Development

OrbStackrockets ahead with 2-5× faster I/O and harnesses Rosetta for blinding x86 speeds on Apple Silicon. For Mac users, it's a zippy Docker alternative. Unified Kubernetes, Linux machines, and effortless file sharing turbocharge development workflows. Meanwhile,Docker Desktopsulks in the corner, w.. read more  

OrbStack: A Deep Dive for Container and Kubernetes Development
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.