Join us

Replacing a 3 GB SQLite database with a 10 MB FST (finite state transducer) binary

Andrew Quinn shipped Taskusanakirja (tsk), a Finnish-English pocket dictionary with search-as-you-type, originally backed by a trie for ~400k base words plus a 3 GB SQLite FTS database to cover the 40-60M inflected forms that Finnish's agglutinative morphology demands. Reaching for BurntSushi's Index 1,600,000,000 Keys with Automata and Rust, he rewrote the index as a finite state transducer using the Rust fst crate, which compresses both prefixes and suffixes, exactly what you want when 100k words all end in the same dozen inflection patterns. The 3 GB SQLite blob collapsed to about 10 MB, a roughly 300x reduction, and the broader lesson he leans on is that the only reason this clean second pass was even possible was because nine months earlier he shipped the ugly SQLite hack instead of waiting for the right answer.


Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Avatar

VarBear #SoftwareEngineering

FAUN.dev()

@varbear
Meet Varbear - your friendly companion! Varbear the Bear builds your weekly reading list - one tool, one tutorial, one commit at a time.
Developer Influence
6

Influence

1

Total Hits

161

Posts