Updates and recent posts about Slurm..

Posts
Description

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Authentication Explained: When to Use Basic, Bearer, OAuth2, JWT & SSO

Modern apps don’t just check passwords—they rely on **API tokens**, **OAuth**, and **Single Sign-On (SSO)** to know who’s knocking before they open the door... read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Writing Load Balancer From Scratch In 250 Line of Code

A developer rolled out a fully working **Go load balancer** with a clean **Round Robin** setup—and hooks for dropping in smarter strategies like **Least Connection** or **IP Hash**. Backend servers live in a custom server pool. Swapping balancing logic? Just plug into the interface... read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Building a Resilient Data Platform with Write-Ahead Log at Netflix

Netflix faced challenges like data loss, system entropy, updates across partitions, and reliable retries. To address these, they built a generic Write-Ahead Log (WAL) system serving a variety of use cases like delayed queues, generic cross-region replication, and multi-partition mutations. WAL abstr.. read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Organize your Slack channels by “How Often”, not “What” - Aggressively Paraphrasing Me

One dev rewired their Slack setup by **engagement frequency**—not subject. Channels got sorted into tiers like “Read Now” and “Read Hourly,” cutting through noise and saving brainpower. It riffs off the **Eisenhower Matrix**, letting priorities shift with projects, not burn people out... read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Privacy for subdomains: the solution

A two-container setup using **acme.sh** gets Let's Encrypt certs running on a Synology NAS—thanks, Docker. No built-in Certbot support? No problem. Cloudflare DNS API token handles auth. Scheduled tasks handle renewal... read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Users Only Care About 20% of Your Application

Modern apps burst with features most people never touch. Users stick to their favorite 20%. The rest? Frustration, bloat, ignored edge cases. Tools like **VS Code**, **Slack**, and **Notion** nail it by staying lean at the core and letting users stack what they need. Extensions, plug-ins, integrati.. read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Uncommon Uses of Common Python Standard Library Functions

A fresh guide gives old Python friends a second look—turns out, tools like **itertools.groupby**, **zip**, **bisect**, and **heapq** aren’t just standard; they’re slick solutions to real problems. Think run-length encoding, matrix transposes, or fast, sorted inserts without bringing in another depen.. read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

The productivity paradox of AI coding assistants

A July 2025 METR trial dropped a twist: seasoned devs using Cursor with Claude 3.5/3.7 moved **19% slower** - while thinking they were **20% faster**. Chalk it up to AI-induced confidence inflation. Faros AI tracked over **10,000 developers**. More AI didn’t mean more done. It meant more juggling, .. read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

Implementing Vector Search from Scratch: A Step-by-Step Tutorial

Search is a fundamental problem in computing, and vector search aims to match meanings rather than exact words. By converting queries and documents into numerical vectors and calculating similarity, vector search retrieves contextually relevant results. In this tutorial, a vector search system is bu.. read more

Link

@faun shared a link, 7 months, 1 week ago

FAUN.dev()

5 Free AI Courses from Hugging Face

Hugging Face just rolled out a sharp set of free AI courses. Real topics, real tools—think **AI agents, LLMs, diffusion models, deep RL**, and more. It’s hands-on from the jump, packed with frameworks like LangGraph, Diffusers, and Stable Baselines3. You don’t just read about models—you build ‘em i.. read more

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.