Sources

Gemini 3.5 Flash: more expensive, but Google plan to use it for everything simonwillison.net

Today at Google I/O, Google released Gemini 3.5 Flash . This one skipped the -preview modifier and went straight to general availability, and Google appear to be using it for a whole lot of their key products: 3.5 Flash is available today to billions of people globally: For everyone via the Gemini app and AI Mode in Google Search For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio For enterprises in Gemini Enterprise Ag…

Gemini 3.5 Flash might be fast enough for gen AI to make sense arstechnica.com

Google says its more efficient Gemini 3.5 Flash is the key to your agentic AI future.

llm-gemini 0.32 simonwillison.net

Release: llm-gemini 0.32 New model gemini-3.5-flash for Gemini 3.5 Flash . See also my notes on Gemini 3.5 Flash , and the pelican I drew using this upgrade to the plugin. Tags: gemini , llm

llm-gemini 0.32a0 simonwillison.net

Release: llm-gemini 0.32a0 Compatible with llm>=0.32a0 alpha - adds the ability to stream reasoning tokens. Tags: gemini , llm

Introducing the Ettin Reranker Family huggingface.co

Two AI-based science assistants succeed with drug-retargeting tasks arstechnica.com

Both tools generate hypotheses; one goes on to analyze some of the data.

OlmoEarth v1.1: A more efficient family of models huggingface.co

OlmoEarth v1.1 from Allen AI lands as a leaner family of geospatial foundation models, tuned for satellite and remote-sensing tasks. The update targets efficiency over its predecessor, aiming to make Earth observation pipelines cheaper to run without sacrificing the v1 accuracy baseline.

datasette-llm 0.1a8 simonwillison.net

Simon Willison’s datasette-llm 0.1a8 patches the llm_prompt_context() hook so it fully walks chains of prior responses, with a matching 0.1a4 release of the accountant plugin propagating the fix to usage tracking. The bug had truncated multi-turn context collection.

datasette-llm-accountant 0.1a4 simonwillison.net

Release: datasette-llm-accountant 0.1a4 Fixed bug tracking chains of responses. Refs datasette-llm#7 Tags: llm , datasette

References

Hacker News thread 48196570 news.ycombinator.com

Napkin-math on the TPU 8i serving specs and 280 tok/s target puts 3.5 Flash at ~250–300B total / 10–16B active in an MoE — significantly larger than prior Flash models, which explains why Google had to raise the per-token price even as the inference architecture got more efficient.

VentureBeat venturebeat.com

Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year — Pichai framed the model as a ‘financial lifeline’ for organizations migrating workloads off Pro-tier endpoints.

Towards AI — ‘I tested Gemini 3.5 Flash on 18 agent tasks’ pub.towardsai.net

On a 14-step MCP chain Gemini 3.5 Flash finished in 11.3 seconds vs 38.9s for Claude Opus 4.7 and 46.1s for GPT-5.5 — 6% pricier than old Flash but ~4x the wall-clock speed of the frontier.

Digital Applied benchmark roundup digitalapplied.com

Gemini 3.5 Flash leads MCP Atlas at 83.6% (vs Opus 4.7 at 79.1% and GPT-5.5 at 75.3%), but still trails Opus on SWE-Bench Pro (55.1% vs 64.3%) — the ‘agentic’ framing hides a real coding-quality gap.

Sean Goedecke blog — Responses API analysis seangoedecke.com

Gemini’s Interactions API keeps interleaved thoughts and tool calls inspectable, where OpenAI’s Responses compaction replaces them with opaque encrypted items — the two camps have diverged on transparency vs token compression, and both create hard vendor lock-in for live conversations.

Vibe Coding Academy — coding-assistant comparison vibecodingacademy.ai

Antigravity 2.0’s agent-manager UI is ‘cluttered’ with steep orchestration overhead, and early adopters report agents acting with ‘eerie’ confidence that requires significant manual verification — Claude Code still wins multi-file SWE-bench at 80.8%.

Weller et al., ‘Seq vs Seq: An Open Suite of Paired Encoders and Decoders’ (arXiv 2507.11412) arxiv.org

The Ettin suite pairs encoder-only and decoder-only models from 17M to 1B parameters trained on identical 2T-token recipes, with batch-level training order and 200+ intermediate checkpoints released — the first apples-to-apples comparison showing encoders consistently beat decoders of much larger size on classification and retrieval.

ThinkingLoop, ‘10 Vector Rerankers Benchmarked on Cost vs Quality’ (Medium) medium.com

Cohere Rerank 4 and Voyage Rerank 2.5 remain the quality ceiling, often providing a 15–30% precision boost over embedding-only retrieval; open-weight Apache-2.0 options like BGE-v2-m3 are preferred where licensing matters, while Jina Reranker v2 weights are CC-BY-NC-4.0 and unsuitable for many commercial deployments.

Qwen3-Embedding/Reranker technical blog (qwenlm.github.io) qwenlm.github.io

Qwen3-Reranker spans 0.6B–8B with a CausalLM ‘yes/no’ logit scoring head, 32K native context, and >100-language support — capabilities Ettin’s English-only, 8K-context CrossEncoder design does not match, at the cost of substantially higher per-pair latency.

Portkey.ai summary of ModernBERT paper portkey.ai

ModernBERT’s 8K context and unpadding gains are gated on Flash Attention 2, which requires GPU + fp16/bf16; on CPU or without FA2 the architecture reverts to standard quadratic scaling, and Optimum/ONNX export paths have been slow to land — a real obstacle for production deployment.

John6666 activity posts on Hugging Face huggingface.co

The 8.3x speedup is a real result under specific H100 conditions (bf16 + FA2 vs. fp32 + SDPA) but should be read as a bounded observation, not a general platform claim; consumer GPUs like the RTX 3090 see materially smaller gains because they cannot exploit the same Hopper-specific kernels.

HubNextEra, ‘How Ettin Rerankers Boost Your Embedder Performance’ news.hubnextera.com

The 32M Ettin reranker outperforms the 568M BGE-reranker-v2-m3 by +0.025 NDCG@10 on MTEB despite being 17x smaller, and the 68M variant matches Qwen3-Reranker-0.6B at roughly one-ninth the parameter count.

The Scientist — coverage of Robin’s dAMD finding the-scientist.com

Robin independently hypothesized that enhancing the phagocytic activity of retinal pigment epithelium cells could mitigate disease progression… ripasudil increased debris clearance by 7.5 times and upregulated the lipid efflux pump ABCA1.

k-dense.ai — ‘AI Co-Scientist, Not AI Scientist’ k-dense.ai

Independent critics have labeled the results ‘underwhelming,’ noting that the ‘novel’ drugs identified by the AI were already well-established in existing literature… Co-Scientist may suffer from data leakage, essentially acting as a sophisticated search engine rather than an independent discoverer.

FutureHouse — LAB-Bench announcement futurehouse.org

FutureHouse agents (Crow and Falcon) achieved ~90% accuracy on the LitQA benchmark, significantly outperforming PhD-level researchers who averaged ~67%.

Bioengineer.org — commentary on Robin Nature paper bioengineer.org

The analytical agent Finch struggled with complex bioinformatics and statistics, frequently requiring human prompts to correct errors… human researchers had to override several of the AI’s experimental design suggestions, suggesting that ‘autonomous’ discovery still relies heavily on human ‘sense-checking’.

EurekAlert — RAND/biosecurity framing of agentic biology eurekalert.org

Such systems ‘raise the floor’ for non-experts to access complex biological knowledge and ‘raise the ceiling’ for experts to accelerate the design of dangerous pathogens… agentic systems can iterate on experimental protocols and interface with lab robotics, potentially bypassing traditional biosafety safeguards.

ETC Journal — response to Nature’s 25 March 2026 editorial on AI scientists etcjournal.com

The shift from AI tools to autonomous AI agents may amplify existing crises in research integrity, such as the production of ‘paper mill’ content and hallucinated data analyses.

Sources

References

Jack Sun, writing.