ContentPosts from @faun..
Link
@faun shared a link, 1 month, 2 weeks ago

LLM Evaluation: Practical Tips at Booking.com

Booking.com built Judge-LLM, a framework where strong LLMs evaluate other models against a carefully curated golden dataset. Clear metric definitions, rigorous annotation, and iterative prompt engineering make evaluations more scalable and consistent than relying solely on humans. **The takeaway**:..

Link
@faun shared a link, 1 month, 2 weeks ago

Guardians of the Agents 

A new static verification framework wants to make runtime safeguards look lazy. It slaps **mathematical safety proofs** onto LLM-generated workflows *before* they run—no more crossing fingers at execution time. The setup decouples **code from data**, then runs checks with tools like **CodeQL** and ..

Link
@faun shared a link, 1 month, 2 weeks ago

PostgreSQL maintenance without superuser

PostgreSQL’s moving in on superusers. As of recent releases—starting way back in v9.6 and maturing through PostgreSQL 18 (coming 2025)—there are now **15+ built-in admin roles**. No need to hand out superuser just to get things done. These roles cover the ops spectrum: monitoring, backups, fil..

PostgreSQL maintenance without superuser
Link
@faun shared a link, 1 month, 2 weeks ago

Magical systems thinking

AI now writes over **25% of Google’s** and as much as **90% of Anthropic’s** code. That’s not a trend—it’s a regime change. Still, the mess in large public systems reminds us: clever analysis isn’t enough. Complex systems don’t behave; they misbehave. When the machines are churning out code, the ..

Magical systems thinking
Link
@faun shared a link, 1 month, 2 weeks ago

Writing an operating system kernel from scratch

A barebonestime-sharing OS kernel, written inZig, running onRISC-V. It leans onOpenSBIfor console I/O and timer interrupts. Threads? Statically allocated, each running inuser mode (U-mode). The kernel stays insupervisor mode (S-mode), where it catchessystem callsandcontext switchesvia timer ticks. ..

Writing an operating system kernel from scratch
Link
@faun shared a link, 1 month, 2 weeks ago

Scaling Prometheus: Managing 80M Metrics Smoothly

Flipkart ditched its creakyStatsD + InfluxDBstack for afederated Prometheussetup—built to handle 80M+ time-series metrics without choking. The move leaned intopull-based collection,PromQL's firepower, andhierarchical federationfor smarter aggregation and long-haul queries. Why it matters:Prometheus..

Scaling Prometheus: Managing 80M Metrics Smoothly
Link
@faun shared a link, 1 month, 2 weeks ago

Accelerate serverless testing with LocalStack integration in VS Code IDE

The AWS Toolkit for VS Code now hooks straight into **LocalStack**. Run full end-to-end tests for **serverless workflows**—Lambda, SQS, EventBridge, the whole crew—without bouncing between tools or writing boilerplate. Just deploy to LocalStack from the IDE using the **AWS SAM CLI**. It feels like ..

Accelerate serverless testing with LocalStack integration in VS Code IDE
Link
@faun shared a link, 1 month, 2 weeks ago

Introducing Budget Controls for AWS: Automatically Manage Your Cloud Costs

**Budget Controls for AWS** just got better. The open-source tool now reins in more than just EC2. It wrangles **RDS Aurora**, **SageMaker**, and **OpenSearch** too. Under the hood, it taps **AWS Budgets**, **AWS Config**, and **custom tags** to watch spend like a hawk. Hit a budget threshold? It c..

Introducing Budget Controls for AWS: Automatically Manage Your Cloud Costs
Link
@faun shared a link, 1 month, 2 weeks ago

SLI Evolution Stages

A new SLI evolution model lays out a maturity roadmap—from rebranded latency/error metrics to ones that actually track business impact. It replaces shallow signals and pulls in the stuff that matters: how service failures hit user goals, tasks, and bottom lines...

SLI Evolution Stages
Link
@faun shared a link, 1 month, 2 weeks ago

%CPU Utilization Is A Lie

Stress tests on the Ryzen 9 5900X uncovered a big gap between **reported CPU utilization** and what the chip actually pushes. Around 50% on paper? Could mean close to full throttle in reality—thanks to sneaky behaviors from **SMT resource sharing** and **Turbo frequency scaling**. **Takeaway:** Raw..

%CPU Utilization Is A Lie