Updates and recent posts about Slurm..

Posts
Description

Link

@faun shared a link, 7 months, 3 weeks ago

FAUN.dev()

v1.34: Recovery From Volume Expansion Failure (GA)

Kubernetes v1.34 bumps **automated recovery from botched PVC expansions** to GA. Users can now fix bad volume size requests—no admin, no drama. It cleans up unused quota, slows down retry spam, and surfaces progress with new PVC status fields... read more

Link

@faun shared a link, 7 months, 3 weeks ago

FAUN.dev()

Kubernetes Security: Best Practices to Protect Your Cluster

A new JetBrains IDE plugin throws Kubernetes security best practices straight into your deployment manifests—right where they belong. Think: checks for `runAsRoot`, privileged mode, `hostPath`, host ports, and sketchy sysctls. No hand-waving. It enforces stuff like: - Default `runAsNonRoot` - Drop .. read more

Link

@faun shared a link, 7 months, 3 weeks ago

FAUN.dev()

v1.34: DRA Consumable Capacity

Kubernetes 1.34 rolls in **consumable capacity** for Dynamic Resource Allocation (DRA). That means device plugins can now carve up resources—GPU memory, NIC bandwidth, etc.—into precise slices for Pods, ResourceClaims, and namespaces. The scheduler tracks it all, so nothing spills over... read more

Link

@faun shared a link, 7 months, 3 weeks ago

FAUN.dev()

v1.34: Decoupled Taint Manager Is Now Stable

Kubernetes 1.34 graduates the taint eviction controller to GA. Now, the node lifecycle controller only applies taints, while a dedicated taint eviction controller manages pod eviction. First split in 1.29, now stable in 1.34... read more

Link

@faun shared a link, 7 months, 3 weeks ago

FAUN.dev()

v1.34: Pods Report DRA Resource Health

Kubernetes v1.34 lands with an alpha upgrade to **[KEP-4680](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4680-add-resource-health-to-pod-status)**, pushing **Dynamic Resource Allocation (DRA)** into smarter territory: health-aware Pods. DRA drivers can now stream device heal.. read more

Story

@laura_garcia shared a post, 7 months, 4 weeks ago

Software Developer, RELIANOID

Secure Boot Advanced Targeting (SBAT): Scaling Boot Security 🔐

Discover how SBAT enhances Secure Boot by introducing a smarter way to handle vulnerabilities, reducing overhead, and ensuring your system's boot process stays secure. Learn how it works, how it addresses scalability, and why it's a game-changer for modern boot security across Linux and Windows envi..

Story

@laura_garcia shared a post, 8 months ago

Software Developer, RELIANOID

Cyber Security & Cloud Expo Europe in Amsterdam

🔐 On 24–25 September 2025, RELIANOID will be at Cyber Security & Cloud Expo Europe in Amsterdam! Join us to explore how we enable secure, scalable, and Zero Trust–ready application delivery. 👉 https://www.relianoid.com/about-us/events/cyber-security-cloud-expo-2025/ #CyberSecurity#Cloud#ZeroTrust#De..

cybersecurity and cloud expo amsterdam event

Story

@laura_garcia shared a post, 8 months ago

Software Developer, RELIANOID

Cyber Security & Cloud Expo Europe in Amsterdam

Story

@laura_garcia shared a post, 8 months ago

Software Developer, RELIANOID

🔐 Industrial networks face increasing complexity and evolving cyber threats.

To strengthen defenses, many organizations are moving beyond traditional segmentation and adopting microsegmentation — a strategy that creates independent, secure zones to better protect critical assets. We’ve prepared a clear diagram to illustrate how defense-in-depth and microsegmentation can be a..

Industrial Zero-Trust Micro-Segmentation

Link

@anjali shared a link, 8 months ago

Customer Marketing Manager, Last9

What is Asynchronous Job Monitoring?

Know how asynchronous job monitoring tracks background tasks, ensuring they finish reliably, perform well, and stay visible at scale.

Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.