Around the weights: gating, silicon, and scaffolding take over
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
A pelican for GPT-5.5 via the semi-official Codex backdoor API simonwillison.net
GPT-5.5 is out . It’s available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I’ve had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it’s hard to put into words what’s good about it - I ask it to build things and it builds exactly what I ask for! There’s one notable omission from today’s release - the API: API deployments require different safeguards and we are working closely with partners and customers o…
GPT-5.5 prompting guide simonwillison.net
GPT-5.5 prompting guide Now that GPT-5.5 is available in the API , OpenAI have released a wealth of useful tips on how best to prompt the new model. Here’s a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response: Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences. I’ve already noticed their Codex app doing t…
llm 0.31 simonwillison.net
Release: llm 0.31 New GPT-5.5 OpenAI model: llm -m gpt-5.5 . #1418 New option to set the text verbosity level for GPT-5+ OpenAI models: -o verbosity low . Values are low , medium , high . New option for setting the image detail level used for image attachments to OpenAI models: -o image_detail low - values are low , high and auto , and GPT-5.4 and 5.5 also accept original . Models listed in extra-openai-models.yaml are now also registered as asynchronous. #1395 Tags: gpt , openai , llm
llm-openai-via-codex 0.1a0 simonwillison.net
Release: llm-openai-via-codex 0.1a0 Hijacks your Codex CLI credentials to make API calls with LLM, as described in my post about GPT-5.5 . Tags: openai , llm , codex-cli
Sign of the future: GPT-5.5 oneusefulthing.org
One impressive step on the curve
DeepSeek V4 - almost on the frontier, a fraction of the price simonwillison.net
Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) last December . They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash . Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They’re using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It’s larger than Kimi…
An update on recent Claude Code quality reports simonwillison.net
An update on recent Claude Code quality reports It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems. The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users. Anthropic’s postmortem describes these in detail. This one in particular stood out to me: On March 26, we shipped a change to clear Claud…
russellromney/honker simonwillison.net
Honker is a Rust SQLite extension that brings Postgres-style NOTIFY/LISTEN, durable Kafka-like streams and the transactional outbox pattern to SQLite. It adds 20+ custom SQL functions and lets workers poll the .db-wal file every 1ms for near-real-time delivery without running full queries.
Extract PDF text in your browser with LiteParse for the web simonwillison.net
Simon Willison ported LlamaIndex’s LiteParse PDF extractor — which uses spatial heuristics and Tesseract OCR rather than AI — to run entirely in the browser via PDF.js. He vibe-coded the port with Claude Code in a 59-minute session, deploying to GitHub Pages without reading a line of the generated TypeScript.
Serving the For You feed simonwillison.net
Bluesky’s For You custom feed, used by roughly 72,000 people, runs on a single Go process with SQLite on a 16-core, 96GB gaming PC in spacecowboy’s living room, fronted by a $7/month OVH VPS over Tailscale. Total operating cost: $30/month, including $20 of electricity.
WHY ARE YOU LIKE THIS simonwillison.net
ChatGPT Images 2.0, prompted to depict a horse riding an astronaut riding a pelican on a bicycle, spontaneously added a roadside sign reading ‘WHY ARE YOU LIKE THIS’ — a layered variant of Simon Willison’s pelican-on-a-bicycle benchmark stacking multiple absurd subjects.
Here’s how our TPUs power increasingly demanding AI workloads. blog.google
Google published an explainer video on how its Tensor Processing Units handle increasingly demanding AI workloads, walking through the TPU architecture and its role in Google Cloud’s AI infrastructure stack.
It’s a big one simonwillison.net
Simon Willison’s weekly newsletter rounds up coverage of GPT-5.5, ChatGPT Images 2.0 and Qwen3.6-27B, packaging 5 blog posts, 8 links, 3 quotes and a new chapter of his Agentic Engineering Patterns guide alongside benchmark images of pelicans, a possum and raccoons.
Millisecond Converter simonwillison.net
Willison shipped a small browser tool that converts millisecond values into seconds and minutes, built to scratch the itch of reading prompt durations reported by his LLM command-line utility.
References
Artificial Analysis artificialanalysis.ai
DeepSeek V4 Pro… Intelligence Index score of 52, a significant jump from V3.2’s 42… GDPval-AA index ranks V4 Pro as the current leader among open models for real-world professional tasks (e.g., legal and finance work), scoring 1554—ahead of rivals like GLM-5.1 and MiniMax-M2.7
Digital Applied (Kimi K2 comparison) digitalapplied.com
Kimi K2.6 currently leads the open-weights category with a score of 54 on the Artificial Analysis Intelligence Index, followed closely by DeepSeek V4 Pro at 52 and GLM-5.1 at approximately 50… Kimi K2.6 is natively multimodal, handling text, images, and video within a single architecture, while DeepSeek V4 remains text-only at launch
BiggoFinance / 36kr (Huawei Ascend pivot) finance.biggo.com
V4 was optimized to run on Huawei Ascend processors and ‘Supernode’ technology… the final V4 release achieved a reported 85% hardware utilization rate on Huawei NPUs, which is estimated to provide a 40% reduction in hardware costs compared to comparable NVIDIA A100 deployments
SemiAnalysis newsletter newsletter.semianalysis.com
V4-Pro requires approximately 9.6 GiB of KV cache per sequence [at 1M tokens], nearly nine times less than the estimated 83.9 GiB required by a V3.2-style architecture… makes DeepSeek V4 one of the few models capable of serving million-token windows on a single node (e.g., 8x B200 GPUs) without catastrophic memory overhead
r/DeepSeek technical deep-dive thread reddit.com
the ‘active parameter’ count (e.g., 13B for V4-Flash) is misleading for local deployment; since it is an MoE model, the full 284B or 1.6T parameter weights must still be loaded into memory or high-speed storage, making local hosting inaccessible for most consumers