Join us

ContentUpdates and recent posts about Slurm..
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Anubis and caddy-docker-proxy

CKANfaced a barrage: 60 requests per second, courtesy of some mischief-maker in Brazil. EnterAnubis. With its SHA256 challenge, it cut through the chaos like a hot knife through warm Brazilian pão de queijo. Now, plugging Anubis intocaddy-docker-proxypractically did itself. The proxy auto-configures.. read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

The state of Kubernetes jobs in 2025 Q1

North American Kubernetes salariestook a 6% nosedive, settling at an average$165,288. Meanwhile, Europe enjoyed a tidy 4% uptick. Remote work? Holding steady at68%. No surprise—Pythonremained the darling of coding languages, getting a nod in62%of job posts, whileDockerwasn't far behind, gracing57%of.. read more  

The state of Kubernetes jobs in 2025 Q1
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

From Edge to Enterprise: The StarlingX Advantage

StarlingXtackles low-latency like a boss, perfect for edge and enterprise clouds. It weaves together real-time Linux and OVS DPDK, all while juggling up to5,000 nodes. It scales effortlessly, sprinting from humblesingle-nodesetups to sprawlingtens-of-thousandsin multi-region clouds. Timing precision.. read more  

From Edge to Enterprise: The StarlingX Advantage
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

v1.33: Fine-grained SupplementalGroups Control Graduates to Beta

Kubernetes v1.33 rolls in a snazzy beta feature: control over supplemental group merging in containers. It sharpenssecurityby exposing those sneaky implicit GIDs. But don't get too cozy—this power comes with strings. You’ll need CRI runtimes that play nice, or your pods will get the boot on unsuppor.. read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Cutting Kubernetes Costs with kube-downscaler

kube-downscaleris your go-to for scheduling time-based scaling inKubernetes. It dodges HPA’s hiccups for pre-planned workloads. Imagine cron jobs but for replicas. Straightforward, effective, and perfect for trimming costs on snoozing dev environments... read more  

Cutting Kubernetes Costs with kube-downscaler
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA

Kubernetes v1.33finally pulls its socks up with storage cleanup. It now respects reclaim policies by wielding finalizers. No more leakingPersistentVolumes, even if you delete PVs like a mad hatter... read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Uber’s Journey to Ray on Kubernetes

Uber tossed manual ML resource wrangling for a slick Kubernetes-Ray duo, amping up scalability and slashing inefficiencies.With dynamic resource pools, elastic sharing, and smart scheduling, they rev up utilization and demolish GPU waste—no micromanaging required... read more  

Uber’s Journey to Ray on Kubernetes
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

How to build small and secure Docker images for Rust (FROM scratch)

This Dockerfile allows for the creation of minimal and secure Docker images for Rust projects. It utilizes multi-stage builds to avoid unnecessary dependencies and reduces the size of the final image... read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Podfox: World's First Container-Aware Browser

Podfoxswoops in to transform your browser into a Podman rootless container with a SOCKS proxy—no port forwarding monkey business required. It's like magic for your dev groove. Meanwhile,Homebrewgives container development a twist: it mounts user environments in read-only mode. This way, your favorit.. read more  

Podfox: World's First Container-Aware Browser
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

v1.33: Streaming List responses

Kubernetesunleashed a game-changer:streaming encoding for List responses. What used to hog70-80GBnow zips by on a sleek3GB. That's a20x improvementin memory conservation. Say goodbye to those aggravating Out-of-Memory errors. This upgrade tackles mammoth datasets while babysitting your cluster's sta.. read more  

v1.33: Streaming List responses
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.