An indie dev just went full mad scientist and built a full-stack, transformer-powered search engine—solo. They indexed 280 million pages from scratch with hundreds of crawlers, a fully sharded backend, and serious metal: 64 RocksDB nodes, 200 CPU cores, and 82 TB of SSD.
Under the hood: custom HTML parsers, sentence-level chunking labeled by a context-aware DistilBERT, and HNSW vector search sharded across in-memory nodes. The real kicker? A custom vector DB called CoreNN that runs live graph updates from disk—over 3 billion embeddings and counting.
Big shift: Forget bloated full-text indexes. This stack shows where things are headed—lean, LLM-native search on vector DBs tuned for disk, not RAM.