JS Wei (Jack) Sun

Quinn's 10 MB FST beats a 3 GB SQLite because Finnish shares affixes

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Quoting Andrew Quinn simonwillison.net

One could say in the first quarter-century of my life, that while I was always fascinated by programming, I could never overcome the guilt of not really knowing whether the tool I am building right now isn’t already superceded by some much better implementation someone else has already written 30 or 40 years ago; I could write a TSV-aware search and replace, or I could find out about awk and solve that entire class of problems in one fell swoop, for example. My central conceit is that this is a…

References

serverdigital.net deep-dive patchwindow.serverdigital.net

Finnish is agglutinative — a single root word can produce millions of inflected forms… SQLite’s B-trees are excellent for general-purpose querying, but they store rows mostly redundantly and do not exploit the structural regularity of natural language data.

betacat.io HN aggregator hackernews.betacat.io

Quinn admitted that the SQLite version was a ‘bad easy thing’ chosen over ‘doing nothing,’ which provided the necessary bridge to understand the problem’s real constraints… starting with a ‘stupid’ but correct solution allows for easier verification when the final optimized version is eventually built.

awesome-rust / rust-fst ecosystem notes crates.io

BurntSushi’s rust-fst has indexed over 1.6 billion URLs (134 GB raw) from Common Crawl into a 157 MB memory-mapped index; it is the foundation of Tantivy and Quickwit, which forked it as tantivy-fst for multi-output support.

Jeff Atwood, Coding Horror blog.codinghorror.com

Don’t reinvent the wheel, unless you plan on learning more about wheels.

Matthias Endler, endler.dev (2025) endler.dev

Reinvent for insight, reuse for impact… no wheel is perfectly suited for every possible context; the goal is not to produce more wheels, but to produce more inventors.

github.com/hiAndrewQuinn/tsk github.com

Taskusanakirja (tsk) — a terminal-based Finnish-English dictionary in Go using a custom trie for real-time search-as-you-type, drawing data from Wiktionary; recently pivoted to a closed-source commercial model.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare