Join us

Building a web search engine from scratch in two months with 3 billion neural embeddings

An indie dev just went full mad scientist and built a full-stack, transformer-powered search engine—solo. They indexed 280 million pages from scratch with hundreds of crawlers, a fully sharded backend, and serious metal: 64 RocksDB nodes, 200 CPU cores, and 82 TB of SSD.

Under the hood: custom HTML parsers, sentence-level chunking labeled by a context-aware DistilBERT, and HNSW vector search sharded across in-memory nodes. The real kicker? A custom vector DB called CoreNN that runs live graph updates from disk—over 3 billion embeddings and counting.

Big shift: Forget bloated full-text indexes. This stack shows where things are headed—lean, LLM-native search on vector DBs tuned for disk, not RAM.


Let's keep in touch!

Stay updated with my latest posts and news. I share insights, updates, and exclusive content.

Unsubscribe anytime. By subscribing, you share your email with @faun and accept our Terms & Privacy.

Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Avatar

The FAUN

@faun
A worldwide community of developers and DevOps enthusiasts!
Developer Influence
3k

Influence

302k

Total Hits

3712

Posts