Join us
@devopslinks ・ Feb 10,2026
Anthropic researcher Nicholas Carlini orchestrated 16 autonomous Claude agents working in parallel to build a 100,000-line C compiler in Rust. Using a custom harness for task coordination, testing, and conflict resolution, the agent team produced a compiler capable of building Linux 6.9 across multiple architectures.
The project resulted in a 100,000-line compiler capable of building Linux 6.9 on x86, ARM, and RISC-V architectures.
The project surfaced challenges including task synchronization, frequent merge conflicts, quality assurance issues, and functional limitations.
The work describes the use of agent teams to execute tasks in parallel, coordinated through a simple file-based locking mechanism.
The agent team approach demonstrates that complex software projects can be produced autonomously with limited human supervision.
The authors emphasize the importance of high-quality tests and strict continuous integration to keep autonomous agents on track.
A recent project has developed a Rust-based C compiler using a novel method called "agent teams." This approach involves running 16 instances of the language model Claude in parallel, without active human intervention. Over nearly 2,000 Claude Code sessions and approximately two weeks of continuous execution, the agent team produced a 100,000-line compiler capable of building Linux 6.9 across x86, ARM, and RISC-V architectures. In total, the project consumed roughly 2 billion input tokens, generated 140 million output tokens, and cost just under $20,000 in API usage. The primary focus of the work was not only the compiler itself, but the design of frameworks that allow autonomous agent teams to make sustained progress and execute tasks in parallel. Challenges such as task synchronization, merge conflicts, and regression control were central to the experiment.
The "agent teams" concept allows multiple Claude instances to autonomously work on a shared codebase. Each agent runs inside its own isolated container with a local copy of the repository, while a bare upstream Git repository is used for synchronization. To avoid agents duplicating work, the framework implements a simple file-based locking mechanism, where an agent claims a task by creating a lock file before starting work. If a conflict occurs, Git forces the agent to rebase and choose a different task. Once a task is completed, the agent merges changes back into the upstream repository and releases the lock. A continuous execution loop keeps spawning fresh agent sessions, enabling long-running autonomous development without manual supervision.
Maintaining correctness and preventing regressions proved to be one of the most difficult aspects of the project. As the compiler grew in scope, agents frequently introduced changes that broke existing functionality. To address this, the project incorporated high-quality test suites, strict continuous integration pipelines, and known-good compiler oracles. In particular, GCC was used as a reference compiler to compare outputs and isolate failing files when compiling large codebases such as the Linux kernel. This technique enabled agents to split a monolithic task into smaller, parallelizable units. Using this approach, the compiler was able to build not only the Linux kernel but also large real-world projects including QEMU, FFmpeg, SQLite, Postgres, Redis, Lua, and libjpeg, achieving a ~99% pass rate on most compiler test suites, including the GCC torture tests.
Despite these successes, the compiler has clear limitations. It lacks a native 16-bit x86 code generator, which is required to boot Linux from real mode and instead relies on GCC for that stage.
The project also does not yet have a fully reliable in-house assembler or linker, again falling back to GCC tooling in some cases. Even with optimization passes enabled, the generated machine code is significantly less efficient than GCC’s output, sometimes performing worse than GCC with optimizations disabled. While the Rust codebase is functional and maintainable, it does not yet match the quality or performance of production-grade compilers written by expert human teams.
Overall, the project serves as a stress test of the current limits of autonomous agent teams. It demonstrated both their surprising capabilities and the constraints that still require careful human oversight.
The number of instances of the language model Claude used in parallel for the project.
The total number of lines in the Rust-based C compiler.
The version of Linux that the compiler is capable of building.
The number of Claude Code sessions involved in the project.
The total number of input tokens consumed during the project.
The total number of output tokens generated during the project.
The total cost of the project.
The pass rate of the compiler on most compiler test suites.
The approximate duration of the project.
The number of CPU architectures supported by the compiler.
Researcher on Anthropic’s Safeguards team who designed and ran the experiment using parallel Claude agents to build a C compiler.
AI research organization that develops Claude and where the agent team compiler experiment was conducted.
A large language model instantiated in multiple parallel agents to autonomously write, test, and debug a Rust-based C compiler.
The development environment used to run Claude agents in continuous loops for autonomous coding and testing.
A known-good C compiler used as an oracle to validate correctness and isolate failures during kernel compilation.
Used for task locking, synchronization, merging changes, and coordinating work between parallel Claude agents.
Used to run each Claude agent in an isolated container with its own workspace and controlled environment.
The experiment involved using multiple instances of the language model Claude to develop a Rust-based C compiler with 16 agents, resulting in a 100,000-line compiler capable of building Linux 6.9 on various architectures.
Opus 4.5 was the first model version able to produce a functional compiler that could pass large test suites, although it was still incapable of compiling large real-world projects.
Over nearly two weeks, Opus 4.6 was tested across approximately 2,000 Claude Code sessions, consuming 2 billion input tokens and generating 140 million output tokens at a cost just under $20,000. The resulting compiler could build a bootable Linux 6.9 on x86, ARM, and RISC-V, and compile projects such as QEMU, FFmpeg, SQLite, Postgres, and Redis.
Subscribe to our weekly newsletter DevOpsLinks to receive similar updates for free!
Join other developers and claim your FAUN.dev() account now!
FAUN.dev() is a developer-first platform built with a simple goal: help engineers stay sharp without wasting their time.

FAUN.dev()
@devopslinks
A Comprehensive Guide to Building, Scaling, Deploying, Observing, and Managing Highly-Available Microservices in Kubernetes

A Complete Hands-On Guide to Operational Clarity in Cloud-Native Systems