Join us

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.

An effortless, straightforward way to keep up with technologies...so you can keep your tabs closed and your mind open!

70,000+ developers already joined our ecosystem ⭐⭐⭐⭐⭐
Trusted by engineers at:

Google • Microsoft • AWS • Netflix

Slurm

Slurm is an open-source workload manager and job scheduler for Linux clusters, providing resource allocation, job execution, and queue management for large-scale high-performance computing environmen…

Featured Course(s)

Observability with Prometheus and Grafana

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems

> Get Your Copy

Content

Updates and recent posts about Slurm..

Posts
Description

Story

@laura_garcia shared a post, 8 months, 1 week ago

Software Developer, RELIANOID

RELIANOID Load Balancer Community Edition v7 on AWS using Terraform

🚀 New Guide Available! Learn how to quickly deploy RELIANOID Load Balancer Community Edition v7 on AWS using Terraform. Our step-by-step article shows you how to provision everything automatically — from VPCs and subnets to EC2 and key pairs — in just minutes. 👉 https://www.relianoid.com/resources/k..

Knowledge base Deploy RELIANOID Load Balancer Community Edition v7 with Terraform on AWS

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Deploy a containerized application with Kamal and Terraform

A Docker-first workflow combinesTerraformandKamalinto a lean, Elastic Beanstalk-ish alternative—without the bloat. Terraform spins up a three-tier VPC and wires it toECR. Kamal takes it from there, booting containers on a raw EC2 box: app, proxy, monitor. One script. Done... read more

Deploy a containerized application with Kamal and Terraform

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

AWS, Microsoft and Google unite behind Linux Foundation DocumentDB database to cut enterprise costs and limit vendor lock-in

Document databases are crucial for AI apps in the gen AI era. Microsoft's open-source DocumentDB project, based on PostgreSQL, is moving to the Linux Foundation, offering a vendor-neutral, open-source alternative to MongoDB. DocumentDB's compatibility with MongoDB drivers and open source governance .. read more

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Sandboxed to Compromised: New Research Exposes Credential Exfiltration Paths in AWS Code Interpreters

Researchers poked holes insandboxed Bedrock AgentCore code interpreters—and found a way to leak execution role credentials through theMicroVM Metadata Service (MMDS). No outside network? Doesn’t matter. The exploit dodges basic string filters in requests and lets non-agentic code swipe AWS creds to .. read more

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Measuring Developer Productivity with Amazon Q Developer and Jellyfish

Amazon Q Developer now plugs into Jellyfish. Teams get a clearer view of how AI fits into the real flow of work—prompt usage, code adoption, PR throughput. Not just surface stats. The setup pipes data from AWS S3 straight into Jellyfish’s analytics engine. It tags AI users, tracks velocity gains, an.. read more

Measuring Developer Productivity with Amazon Q Developer and Jellyfish

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Being on the Same Page During an Incident: Not Actually Telepathy

Collaboration in incident response is crucial for effective resolution, starting with establishing a basic compact among responders. Grounding is a process that ensures alignment and common ground is maintained throughout an incident, encompassing initial common ground, public events so far, and the.. read more

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Which LLM writes the best analytical SQL?

Tinybird threw 19 top LLMs at a 200M-row GitHub dataset, testing how well they could turn plain English into solid SQL. Most models kept their syntax clean—but when it came to writing SQL that actually ran well and returned the right results, they lagged behind human pros. Messy schemas or tricky pr.. read more

Which LLM writes the best analytical SQL?

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Building a Scalable, Flexible, Cloud-Native GenAI Platform with Open Source Solutions

A fresh reference architecture built withEnvoy AI GatewayandKServebrings order to the GenAI chaos. One clean interface to route requests across internal and external LLMs—locked down with policies. It’s called aTwo-Tier Gateway Architecture. Think of it like a split-brain: external API traffic goes.. read more

Building a Scalable, Flexible, Cloud-Native GenAI Platform with Open Source Solutions

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

Container Logs in Kubernetes: How to View and Collect Them

This guide shows how to wrangle container logs in Kubernetes—usingkubectl, shell tools, structured logging, and the Kubernetes Dashboard. It covers the basics and dives into how to scale up log collection and make observability less painful across clusters... read more

Container Logs in Kubernetes: How to View and Collect Them

Link

@faun shared a link, 8 months, 1 week ago

FAUN.dev()

v1.34: DRA has graduated to GA

Kubernetes 1.34 turnsDynamic Resource Allocation (DRA)loose into General Availability—enabled by default. That cements native support for high-maintenance gear like GPUs, FPGAs, and any other quirky hardware your workloads need. The release also packs a fresh mix of alpha/beta features: tighter admi.. read more

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.