Join us

ContentUpdates and recent posts about Slurm..
Link
@faun shared a link, 1 year ago
FAUN.dev()

GenAI Meets SLMs: A New Era for Edge Computing

SLMspower up edge computing with speed and privacy finesse. They master real-time decisions and steal the spotlight in cramped settings like telemedicine andsmart cities. On personal devices, they outdoLLMs—trimming the fat with model distillation and quantization. Equipped withONNXandMediaPipe, the.. read more  

Link
@faun shared a link, 1 year ago
FAUN.dev()

Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks

Tekton plusBuildpacks: your secret weapon for training GPT-2 without Dockerfile headaches. They wrap your code in containers, ensuring both security and performance.Tekton Pipelineslean on Kubernetes tasks to deliver isolation and reproducibility. Together, they transform CI/CD for ML into something.. read more  

Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks
Link
@faun shared a link, 1 year ago
FAUN.dev()

God is hungry for Context: First thoughts on o3 pro

OpenAIjust took an axe too3pricing—down 80%. Entero3-prowith its $20/$80 show. They boast a star-studded 64% win rate against o3. Forget Opus;o3-pronails picking the right tools and reading the room, flipping task-specific LLM apps on their heads... read more  

God is hungry for Context: First thoughts on o3 pro
Link
@faun shared a link, 1 year ago
FAUN.dev()

How we’re responding to The New York Times’ data demands in order to protect user privacy

OpenAI is challenging a court order stemming from The New York Times' copyright lawsuit, which mandates the indefinite retention of user data from ChatGPT and API services. OpenAI contends this requirement violates user privacy commitments and sets a concerning precedent. While the company complies .. read more  

How we’re responding to The New York Times’ data demands in order to protect user privacy
Link
@faun shared a link, 1 year ago
FAUN.dev()

FinOps X 2025 Cloud Announcements: AI Agents and Increased FOCUS™ Support

AWSjust decreed its new AI-infusedCost Optimization Hub. This gizmo tackles the chaos of tracking overlapping opportunities among millions of resources. Meanwhile,Google CloudunleashedForecasting Enhancements. They claim their AI now wrangles pesky outliers and wild trends, turning financial crystal.. read more  

Link
@faun shared a link, 1 year ago
FAUN.dev()

Are You Over-Engineering Your Tests? – Think Like a Tester

Over-engineering alert:Automating every last thing? Recipe for disaster. Flaky tests galore! Stick to manual edge cases and sharp, atomic checks instead of drowning in script spaghetti.Abstraction overload ahead!Chasing too much abstraction makes maintenance a headache. Keep tests clean and clear.St.. read more  

Are You Over-Engineering Your Tests? – Think Like a Tester
Link
@faun shared a link, 1 year ago
FAUN.dev()

DevOps Tools Targeted for Cryptojacking

JINX-0132takes a sneaky approach. It exploits Nomad's initial slip-ups to secretly mine crypto. How? By leveraging GitHub for downloads and dodging those pesky Indicators of Compromise (IOCs). Even big players using Nomad to juggle hundreds of clients aren't safe. A simple misconfiguration and poof—.. read more  

DevOps Tools Targeted for Cryptojacking
Link
@faun shared a link, 1 year ago
FAUN.dev()

What I’ve Learned from Designing Landing Zones On Google Cloud

Cloud Foundation FabricandFASTmake Google Cloud feel more like a well-oiled machine than a hair-pulling puzzle. They slice through the setup with killer precision, laying down a rock-solid, enterprise-grade foundation. No IAM madness. No network disasters waiting to explode. Just scalable, secure co.. read more  

What I’ve Learned from Designing Landing Zones On Google Cloud
Link
@faun shared a link, 1 year ago
FAUN.dev()

Exploiting CI/CD with Style(lint): LOTP Guide

CI/CD is vulnerable toLiving Off the Pipeline(LOTP) attacks via tools like linters, formatters, build, and test tools—no need to modify workflows. Hacking depends on unexpected code execution, context files, plugins, environment variables... read more  

Exploiting CI/CD with Style(lint): LOTP Guide
Link
@faun shared a link, 1 year ago
FAUN.dev()

You’re not a platform team if you’re just managing infrastructure

Platform engineering? It's not just gift-wrapping infrastructure as a service. It's about handing devs the reins and saying, "Go wild." Think of it like an Internal Developer Platform (IDP), similar to theGoogle Cloud Platform. Here, users truly own their services. The result? Scalability soars, bot.. read more  

You’re not a platform team if you’re just managing infrastructure
Slurm Workload Manager is an open-source, fault-tolerant, and highly scalable cluster management and scheduling system widely used in high-performance computing (HPC). Designed to operate without kernel modifications, Slurm coordinates thousands of compute nodes by allocating resources, launching and monitoring jobs, and managing contention through its flexible scheduling queue.

At its core, Slurm uses a centralized controller (slurmctld) to track cluster state and assign work, while lightweight daemons (slurmd) on each node execute tasks and communicate hierarchically for fault tolerance. Optional components like slurmdbd and slurmrestd extend Slurm with accounting and REST APIs. A rich set of commands—such as srun, squeue, scancel, and sinfo—gives users and administrators full visibility and control.

Slurm’s modular plugin architecture supports nearly every aspect of cluster operation, including authentication, MPI integration, container runtimes, resource limits, energy accounting, topology-aware scheduling, preemption, and GPU management via Generic Resources (GRES). Nodes are organized into partitions, enabling sophisticated policies for job size, priority, fairness, oversubscription, reservation, and resource exclusivity.

Widely adopted across academia, research labs, and enterprise HPC environments, Slurm serves as the backbone for many of the world’s top supercomputers, offering a battle-tested, flexible, and highly configurable framework for large-scale distributed computing.