Quinn's 10 MB FST beats a 3 GB SQLite because Finnish shares affixes
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Quoting Andrew Quinn simonwillison.net
One could say in the first quarter-century of my life, that while I was always fascinated by programming, I could never overcome the guilt of not really knowing whether the tool I am building right now isn’t already superceded by some much better implementation someone else has already written 30 or 40 years ago; I could write a TSV-aware search and replace, or I could find out about awk and solve that entire class of problems in one fell swoop, for example. My central conceit is that this is a…
References
serverdigital.net deep-dive patchwindow.serverdigital.net
Finnish is agglutinative — a single root word can produce millions of inflected forms… SQLite’s B-trees are excellent for general-purpose querying, but they store rows mostly redundantly and do not exploit the structural regularity of natural language data.
betacat.io HN aggregator hackernews.betacat.io
Quinn admitted that the SQLite version was a ‘bad easy thing’ chosen over ‘doing nothing,’ which provided the necessary bridge to understand the problem’s real constraints… starting with a ‘stupid’ but correct solution allows for easier verification when the final optimized version is eventually built.
awesome-rust / rust-fst ecosystem notes crates.io
BurntSushi’s rust-fst has indexed over 1.6 billion URLs (134 GB raw) from Common Crawl into a 157 MB memory-mapped index; it is the foundation of Tantivy and Quickwit, which forked it as tantivy-fst for multi-output support.
Jeff Atwood, Coding Horror blog.codinghorror.com
Don’t reinvent the wheel, unless you plan on learning more about wheels.
Matthias Endler, endler.dev (2025) endler.dev
Reinvent for insight, reuse for impact… no wheel is perfectly suited for every possible context; the goal is not to produce more wheels, but to produce more inventors.
github.com/hiAndrewQuinn/tsk github.com
Taskusanakirja (tsk) — a terminal-based Finnish-English dictionary in Go using a custom trie for real-time search-as-you-type, drawing data from Wiktionary; recently pivoted to a closed-source commercial model.