Bun ports to Rust, Granite tops sub-100M MTEB, Transformers ships async batching
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Not so locked in any more simonwillison.net
This Mitchell Hashimoto quote about Bun migrating from Zig to Rust reminded me of a similar conversation I had at a conference last week. I was talking to someone who worked for a medium sized technology company with a pair of legacy/ legendary iPhone and Android apps. They told me they had just completed a coding-agent driven rewrite of both apps to React Native. I asked why they chose that, given that coding agents presumably drive down the cost of maintaining separate iPhone and Android apps…
Quoting Mitchell Hashimoto simonwillison.net
[…] On the interesting side is how fungible programming languages are nowadays. Programming languages used to be LOCK IN, and they’re increasingly not so. You think the Bun rewrite in Rust is good for Rust? Bun has shown they can be in probably any language they want in roughly a week or two. Rust is expendable. Its useful until its not then it can be thrown out. That’s interesting! — Mitchell Hashimoto , on Bun porting from Zig to Rust Tags: zig , ai , mitchell-hashimoto , llms , rust , gene…
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality huggingface.co
Unlocking asynchronicity in continuous batching huggingface.co
datasette-ip-rate-limit 0.1a0 simonwillison.net
Datasette.io was getting hammered by misbehaving crawlers, so Simon Willison had Codex (GPT-5.5 xhigh) build datasette-ip-rate-limit 0.1a0. The plugin reads a configurable header like Fly-Client-IP and blocks abusive IPs per-path — production rules cap demo databases at 60 requests per 60 seconds.
Quoting Boris Mann simonwillison.net
Counting agents tells you nothing about the work, Boris Mann argues on Bluesky. Saying you have 11 AI agents, he writes, conveys about as much as saying you have 11 spreadsheets or 11 browser tabs open — a jab at agent-count marketing.
Quoting Mo Bitar simonwillison.net
Want a promotion? Mo Bitar’s TikTok bit advises walking into the CEO’s office, asking for $18,000 in API credits to demo a ‘Ralph Loop,’ and tagging coworkers you’ve ‘automated’ in Slack. The joke lands because nobody, he notes, actually knows what they’re doing.
References
byteiota.com — Bun Zig→Rust rewrite analysis byteiota.com
Bun’s claude/phase-a-port branch translated roughly 960,000 lines of Zig to Rust in about six days, hitting a 99.8% pass rate on Linux x64 — but reviewers counted over 13,000
unsafeblocks compared to fewer than 100 in mature Rust projects like uv, prompting critics to label the output ‘vibecoded’.
Techzine — ‘Bun takes a surprising step from Zig to Rust’ techzine.eu
The pivot followed the Zig core team’s rejection of Bun’s parallelized semantic-analysis contributions under a strict ‘no AI code’ policy for the compiler — a governance conflict, not just a technical preference, drove the language switch.
Hacker News discussion (item 44740693) news.ycombinator.com
Skeptics describe an ‘ontological difference between using code that is no longer understood by humans and deploying code that was never understood by any human in the first place,’ warning of reviewer fatigue when engineers ‘LGTM’ agent output they cannot fully audit.
arXiv 2508.11468 — class-level code translation benchmark arxiv.org
Method-level benchmarks like HumanEval show >80% pass@k, but class-level benchmarks (ClassEval-T, TRACY) reveal sharp drops once cross-method dependencies are required, and translations that pass functional tests are often orders of magnitude slower than the source.
Infosecurity Europe — LLM cybersecurity benchmarks summary infosecurityeurope.com
Studies using CyberSecEval 2 indicate 28.6%–45% of LLM-based code translations introduce new vulnerabilities, particularly in input-validation paths — syntactic equivalence is not security equivalence.
Block Engineering — Base Web → Fluent UI migration engineering.block.xyz
Block describes an agent-driven UI framework migration completed ‘without a flag day,’ showing the same playbook Simon’s RN anecdote points to is already running inside large engineering orgs — not just startups.
PremAI blog — ‘Best Embedding Models for RAG 2026’ blog.premai.io
Jina Embeddings v5-small (677M) achieves a state-of-the-art score of 67.7 on the Multilingual MTEB… IBM’s Granite Multilingual R2 (May 2026) focuses on extreme efficiency; its 311M-parameter version scores 65.2.
AIMultiple — Open-Source Embedding Models review aimultiple.com
Independent testing on the Massive Legal Embedding Benchmark (MLEB) placed the Granite English R2 model at #18 overall, outperforming several larger models such as BGE-M3 and E5 Large in retrieval tasks.
llama.cpp GitHub issue #11282 github.com
Loss of strip_accents preprocessing during GGUF conversion… causes non-ASCII characters (common in the R2’s 200+ supported languages) to be mapped to the [UNK] token, effectively breaking embeddings for multilingual text.
r/LocalLLaMA discussion on production embedding models reddit.com
Users reported that the granite-embedding-reranker-english-r2 model produced scores dangerously close to 1.0 even for clearly non-relevant documents, a behavior IBM researchers later attributed to a faulty ‘merged model’ component.
Atal Upadhyay — ‘Embedding Models for RAG: Top 5 Ranked’ atalupadhyay.wordpress.com
Qwen3-Embedding-8B currently leads major multilingual leaderboards, including MTEB-v2, with scores frequently exceeding 70.5… BGE-M3 remains the only ‘all-in-one’ workhorse that natively produces dense, sparse, and ColBERT-style representations in a single pass.
arXiv — GneissWeb dataset paper arxiv.org
Some researchers initially questioned whether GneissWeb offered more than an ‘incremental improvement’ over existing datasets like FineWeb, noting its continued reliance on Common Crawl as a foundational source.
vLLM blog (perf update, Sept 2024) vllm.ai
Profiling of early vLLM versions indicated that on a single H100 GPU, only 38% of wall-clock time was spent on actual GPU computation… approximately 33% consumed by the HTTP API server and 29% by the scheduler.
turion.ai — vLLM vs SGLang 2026 turion.ai
SGLang’s Zero-Overhead Batch Scheduler… uses an OverlapThread that prepares the metadata for the next batch while the GPU executes the current forward pass, pushing GPU utilization toward 95–98%.
NVIDIA TensorRT-LLM architecture docs nvidia.github.io
The Overlap Scheduler launches the GPU forward pass for iteration n+1 immediately after pass n completes, without waiting for the CPU to process the results… it introduces a one-token latency lag in the pipeline as a trade-off for the massive throughput gain.
Prem AI — LLM inference servers compared 2026 blog.premai.io
Hugging Face officially placed Text Generation Inference (TGI) into maintenance mode on December 11, 2025… For production workloads, Hugging Face now explicitly recommends migrating to vLLM or SGLang.
Hugging Face — async RL training landscape huggingface.co
Independent benchmarks of the Async-GRPO architecture show efficiency gains of up to 42% over veRL and 11x over standard TRL by overlapping rollout, reference logprob inference, and actor training.
Medium — throughput/latency tradeoff in LLM inference medium.com
Since the decode phase of LLMs is often memory-bandwidth bound rather than compute-bound, overlapping compute with communication may yield negligible gains if the GPU is already idling while waiting for weights to stream from VRAM.