Join us

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.

An effortless, straightforward way to keep up with technologies...so you can keep your tabs closed and your mind open!

70,000+ developers already joined our ecosystem ⭐⭐⭐⭐⭐
Trusted by engineers at:

Google • Microsoft • AWS • Netflix

vLLM

vLLM is a high-performance open-source inference and serving engine for large language models (LLMs), designed to maximize throughput and efficiency through optimized memory management and scheduling.

Featured Course(s)

Cloud-Native Microservices With Kubernetes - 2nd Edition

A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in Kubernetes

> Get Your Copy

Content

Updates and recent posts about vLLM..

Posts
Description

Link

@varbear shared a link, 1 month, 1 week ago

FAUN.dev()

Slop Creep: The Great Enshittification of Software

The argument is that coding agents accelerate codebase decay by removing the natural speed limit on bad architectural decisions, compressing months of compounding mistakes into days. The defense is to invest ten times more in the planning phase, with concrete code snippets for the data models and ab.. read more

Link

@kaptain shared a link, 1 month, 1 week ago

FAUN.dev()

CNCF Project Antrea Compromised in Daring GitHub Attack

A throwaway GitHub account compromised CNCF projectAntrea's Jenkins infrastructure on May 2 by opening a malicious PR and firing/test-*slash-commands that detonated the workflow against PR-fork code with credentials in scope. The same operator ran parallel campaigns against at least seven other proj.. read more

CNCF Project Antrea Compromised in Daring GitHub Attack

Link

@kaptain shared a link, 1 month, 1 week ago

FAUN.dev()

How Cloud Native Infrastructure Powers AI on Kubernetes

A vendor piece from Mirantis arguing that GPU multi-tenancy on Kubernetes is widely misrepresented, with most platforms shipping namespace-based isolation while production GPU clouds require hardware-enforced separation through MIG partitioning, cluster-per-tenant architecture, and DPU-based network.. read more

How Cloud Native Infrastructure Powers AI on Kubernetes

Link

@kaptain shared a link, 1 month, 1 week ago

FAUN.dev()

v1.36: Moving Volume Group Snapshots to GA

Volume group snapshots reachedGAin Kubernetesv1.36, with the API promoted togroupsnapshot.storage.k8s.io/v1. The feature lets aVolumeGroupSnapshotobject take crash-consistent snapshots across multiple PVCs selected by label, removing the need to quiesce applications that span separate data and log v.. read more

Link

@kaptain shared a link, 1 month, 1 week ago

FAUN.dev()

v1.36: Declarative Validation Graduates to GA

Declarative validation graduated toGAin Kubernetesv1.36, replacing handwritten Go validation with+k8s:marker tags on field definitions... read more

Link

@kaptain shared a link, 1 month, 1 week ago

FAUN.dev()

v1.36: Server-Side Sharded List and Watch

Alpha inv1.36, server-side sharded list and watch adds ashardSelectorfield toListOptionsso the API server uses an FNV-1a hash onmetadata.uidormetadata.namespaceto send each controller replica only its slice of the resource collection. This eliminates the cost of every replica deserializing the full .. read more

Link

@kala shared a link, 1 month, 1 week ago

FAUN.dev()

Orchestrating AI Code Review at scale

Cloudflare engineers built an AI code review platform on OpenCode. They split GitLab integration, model providers, prompts, and policy into separate plugins. A coordinator assigns up to seven domain reviewers across security, performance, code quality, documentation, release checks, and AGENTS.md co.. read more

Orchestrating AI Code Review at scale

Link

@kala shared a link, 1 month, 1 week ago

FAUN.dev()

How We Built an AI Second Brain for 60K Knowledge Workers

Meta built an AI agent system internally called the AI Second Brain that now has over 63,000 installs and ~10,000 daily active users across engineering, PM, design, legal, finance, comms, and sales, growing from zero in roughly three months after a non-technical PM's adoption post. The architecture .. read more

How We Built an AI Second Brain for 60K Knowledge Workers

Link

@kala shared a link, 1 month, 1 week ago

FAUN.dev()

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Netflix's Saish Sali, Nipun Kumar, and Sura Elamurugu describe the Metadata Service (MDS), a graph layer built to connect siloed ML tooling (model registry, pipeline orchestrator, experimentation platform, feature store, dataset platform, identity) across personalization, studio, payments, and ads. .. read more

Link

@kala shared a link, 1 month, 1 week ago

FAUN.dev()

The AWS MCP Server is now generally available

AWS now offers AWS MCP Server as a managed remote MCP server in US East (N. Virginia) and Europe (Frankfurt). MCP-compatible clients can use existing IAM credentials to access more than 15,000 AWS API operations. For GA, AWS added IAM context keys, documentation retrieval without authentication, low.. read more

The AWS MCP Server is now generally available

vLLM is an advanced open-source framework for serving and running large language models efficiently at scale. Developed by researchers and engineers from UC Berkeley and adopted widely across the AI industry, vLLM focuses on optimizing inference performance through its innovative PagedAttention mechanism — a memory management system that enables near-zero waste in GPU memory utilization. It supports model parallelism, continuous batching, tensor parallelism, and dynamic batching across GPUs, making it ideal for real-world deployment of foundation models. vLLM integrates seamlessly with Hugging Face Transformers, OpenAI-compatible APIs, and popular orchestration tools like Ray Serve and Kubernetes. Its design allows developers and enterprises to host LLMs with reduced latency, lower hardware costs, and increased throughput, powering everything from chatbots to enterprise-scale AI services.