Join us

ContentUpdates and recent posts about Slurm..
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Demystifying Log Retention in Azure

Azure logs come in three flavors: **Activity Logs**, **Diagnostic Logs**, and **Log Analytics**. Each with its own rules for retention and billing. The catch? Those differences aren’t quirks—they’re baked in... read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

What are Error Budgets? A Guide to Managing Reliability

OneUptime shows how to put **error budgets** to work—keeping feature velocity in check without tanking reliability. The goal: ship fast, stay within SLOs. They do it by tracking **burn rates**, syncing across teams, and tuning SLOs to match how users actually use the product. Less guesswork, more s.. read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

KubeCon + CloudNativeCon North America 2025 Co-Located Event Deep Dive: Kubernetes on Edge Day

The inaugural Edge Day launched as a co-located event at KubeCon + CloudNativeCon EU in 2022, focusing on edge computing and the evolution from centralized data centers to the network edge. The event brings together academic research, enterprise use cases, and insights from the Kubernetes community... read more  

KubeCon + CloudNativeCon North America 2025 Co-Located Event Deep Dive: Kubernetes on Edge Day
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Fluentd to Fluent Bit: A Migration Guide

Fluent Bit just edged out Fluentd as the CNCF’s go-to log processor. Why? It's fast—up to 40× faster. Built in C. Embedded plugins. Native OpenTelemetry. Full observability baked in. It handles routing, schema changes, and telemetry across containers and edge systems without flinching. No Ruby here.. read more  

Fluentd to Fluent Bit: A Migration Guide
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Intelligent Kubernetes Load Balancing at Databricks

Databricks replaced default Kubernetes load balancing for a **proxyless, client-side gRPC setup**, wired up through a custom control plane. No more **CoreDNS**. No more **kube-proxy**. Clients now get live endpoint discovery through **xDS**, plus smarter routing tricks like **Power of Two Choices** .. read more  

Intelligent Kubernetes Load Balancing at Databricks
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Top 10 Kubernetes Deployment Errors: Causes and Fixes (And Tips)

Misconfigured YAML. Broken image refs. Botched resource settings. Most Kubernetes deploys don't fail mysteriously—they fail predictably. This guide breaks down the top 10 culprits: things like `CrashLoopBackOff`, bad image pulls, and `OOMKills`. More importantly, it shows how to dodge them with bet.. read more  

Top 10 Kubernetes Deployment Errors: Causes and Fixes (And Tips)
Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

v1.34: Pod Level Resources Graduated to Beta

Kubernetes v1.34 bumps **Pod Level Resources** to Beta—and flips them on by default. Now you can set CPU, memory, and hugepages limits for the whole Pod, not just per container. That means smoother scheduling, stricter resource caps, and less sidecar thrashing. **Why it matters:** This shifts Kuber.. read more  

Link
@faun shared a link, 7 months, 1 week ago
FAUN.dev()

Why Rancher's Founders Pivoted From Kubernetes to Agentic AI

Obot.ai just dropped out of stealth with $35M in seed and a big swing: it’s building a control plane for agentic AI, anchored on the now-standard **Model Context Protocol (MCP)**. Its **MCP Gateway** handles registry, secure proxying, RBAC, and observability for MCP servers. Think API gateway, but .. read more  

Why Rancher's Founders Pivoted From Kubernetes to Agentic AI
Story FAUN.dev() Team
@eon01 shared a post, 7 months, 1 week ago
Founder, FAUN.dev

Data-Driven Developer Journalism: Announcing FAUN.dev News, a Smarter Way to Read Developer News

We launched a new news experience at FAUN.dev that uses advanced retrieval to deliver context-rich, insightful news for developers.

FAUN.dev Developer Journalism
News FAUN.dev() Team
@varbear shared an update, 7 months, 1 week ago
FAUN.dev()

Perplexity AI's Comet Browser Launches Globally, Free for All Users

Perplexity AI launches the Comet browser globally, offering it for free to enhance internet usage with features like the Comet Assistant and Background Assistants, aiming to foster curiosity and productivity.

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.