JS Wei (Jack) Sun

Scaffolding and subsidies, not weights, are carrying AI's headline numbers

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Quoting Bobby Holley simonwillison.net

As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation. […] Our experience is a hopeful one for teams who shake off the vertigo and get to work. You may need to reprioritize everything else to bring relentless and single-minded focus to the task, but there is light at the end of the tunnel. We…

Mozilla: Anthropic’s Mythos found 271 security vulnerabilities in Firefox 150 arstechnica.com

CTO says new AI model is “every bit as capable” as world’s best security researchers.

Introducing OpenAI Privacy Filter openai.com

OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information (PII) in text with state-of-the-art accuracy

Changes to GitHub Copilot Individual plans simonwillison.net

Changes to GitHub Copilot Individual plans On the same day as Claude Code’s temporary will-they-won’t-they $100/month kerfuffle (for the moment, they won’t ), here’s the latest on GitHub Copilot pricing. Unlike Anthropic, GitHub put up an official announcement about their changes, which include tightening usage limits, pausing signups for individual plans (!), restricting Claude Opus 4.7 to the more expensive $39/month “Pro+” plan, and dropping the previous Opus models entirely. The key paragra…

That’s my designer - Claude bensbites.com

Ben’s Bites flags a Claude design-focused update that ships alongside a new model, Opus 4.7. The snippet is thin on specifics, but the headline framing positions Claude as a designer-replacement tool rather than just a chat or coding assistant.

Announcing the Anthropic Economic Index Survey anthropic.com

Anthropic opens its Economic Index Survey, soliciting first-party data from workers and businesses on how they use AI. The effort extends the Economic Index beyond Claude.ai conversation logs, aiming to capture adoption and task patterns the telemetry alone misses.

Is Claude Code going to cost $100/month? Probably not - it’s all very confusing simonwillison.net

Anthropic silently updated its pricing page to make Claude Code exclusive to the $100/month Max plan, then reverted within hours after Reddit, HN and Twitter erupted. Growth lead Amol Avasare called it a test on 2% of new prosumer signups; OpenAI’s Codex team pounced, reaffirming a free tier.

Introducing ChatGPT Images 2.0 openai.com

OpenAI rolled out ChatGPT Images 2.0, powered by GPT-Image-2, pitching gains in text rendering, multilingual prompts and visual reasoning. The release pairs a consumer-facing ChatGPT update with the underlying API model, targeting the gap with Google’s Imagen and Midjourney on typography-heavy generations.

(AINews) OpenAI launches GPT-Image-2 latent.space

with Cursor getting a $10B contract with xAI and a right to acquire for $60B.

(AINews) Moonshot Kimi K2.6: the world’s leading Open Model refreshes to catch up to Opus 4.6 (ahead of DeepSeek v4?) latent.space

Moonshot refreshed its open-weights Kimi K2.6, which Latent Space’s AINews pegs as roughly matching Anthropic’s Claude Opus 4.6 on benchmarks. The update lands ahead of an anticipated DeepSeek v4 release and reclaims the top open-model slot.

Reading today’s open-closed performance gap interconnects.ai

Nathan Lambert breaks down what actually drives the headline benchmark gap between open and closed frontier models, arguing the single eval number obscures training-compute, data and post-training factors, and sketches how the gap is likely to evolve as open labs catch up.

Scaling Codex to enterprises worldwide openai.com

OpenAI launched Codex Labs and named Accenture, PwC and Infosys as deployment partners for enterprise Codex rollouts across the software development lifecycle. The company also disclosed Codex has reached 4 million weekly active users, framing the product as a serious challenger to Claude Code in the enterprise.

Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4 importai.substack.com

At what point do the financial markets price in the singularity?

Partnering with industry leaders to accelerate AI transformation deepmind.google

Google DeepMind partners with global consultancies to bring the power of frontier AI to organizations around the world.

3 new ways Ads Advisor is making Google Ads safer and faster blog.google

The Road to Google Marketing Live graphic. May 20, 2026, 9am PT. White background with colorful bottom gradient.

Pentagon wants $54B for drones, more than most nations’ military budgets arstechnica.com

The proposed Pentagon drone investment rivals Ukraine’s entire military budget.

(AINews) RIP Pull Requests (2005-2026) latent.space

a quiet day lets us report on the death of the pull requests

10 Things That Matter in AI Right Now technologyreview.com

LLMs+ technologyreview.com

When ChatGPT launched as an experimental prototype in late 2022, OpenAI’s chatbot became an everyday everything app for hundreds of millions of people. LLMs like ChatGPT were the new future: The entire tech industry was consumed by the inferno, with companies racing to spin up rival products. The ashes of the old tech world still…

Supercharged scams technologyreview.com

When ChatGPT was released to the public in late 2022, it opened people’s eyes to how easily generative AI could churn out vast amounts of human-seeming text from simple prompts. This quickly caught the attention of criminals, who soon began using large language models to produce malicious emails—both the untargeted spam kind and more sophisticated,…

World models technologyreview.com

AI systems have already gained impressive mastery over the digital world, but the physical world is still humanity’s domain. As it turns out, building an AI system that can compose a novel or code an app is far easier than developing one that can fold laundry or navigate a city street. To get there, many…

Weaponized deepfakes technologyreview.com

For years, experts have warned that deepfakes—AI-generated videos, images, or audio recordings of people doing or saying things they haven’t actually done in real life—could be deployed in malicious ways. These dangers are now here. Improvements in deepfake technology, and the widespread availability of easy-to-use and cheap (or free) generative models, have made it easier…

References

flyingpenguin.com — ‘Mythos Mystery in Mozilla Numbers’ flyingpenguin.com

271 vulnerabilities… the canonical security advisory (MFSA 2026-30) credited Anthropic with only three specific CVEs… a ‘90x difference’… the flagship Firefox demonstration may have been conducted against a ‘stripped out’ test harness with modern sandboxing and defenses removed… exploit success rate reportedly plummeted from over 70% to just 4.4%

AISLE — ‘AI Cybersecurity After Mythos: The Jagged Frontier’ aisle.com

a 5.1B-parameter open model was able to recover the logic for a 27-year-old OpenBSD bug that Anthropic had used as a marquee example… the true ‘moat’… is the discovery scaffolding, rather than the raw intelligence of the model itself

The Hacker News — community discussion writeup thehackernews.com

only three issues (CVE-2026-6746, CVE-2026-6757, and CVE-2026-6758) were officially credited to Claude in the high-severity advisory, suggesting many of the 271 were likely lower-severity flaws, ‘defense-in-depth’ issues, or non-exploitable code paths… unauthorized users on Discord managed to access the model by guessing naming conventions and exploiting a third-party contractor’s breach

ArmorCode blog — token economics analysis armorcode.com

identifying the FFmpeg vulnerabilities required several hundred runs at an estimated cost of $10,000… the cost per valid finding has dropped to roughly $50, making it cheaper to burn compute than to hire human researchers… Anthropic committed $100 million in usage credits to its Project Glasswing partners

TechRadar — Mozilla quotes techradar.com

Mythos is ‘every bit as capable’ as elite human researchers, [but] it has not yet found a new class of ‘AI-exclusive’ bugs; it simply finds existing flaws at an unprecedented scale

HackRead — Cisco open-weight research hackread.com

open-weight models… can also be modified to bypass safety guardrails or used in multi-turn attacks that are significantly more effective at generating malicious exploits than single-turn prompts

Tonic.ai benchmark blog tonic.ai

Out-of-the-box F1 scores on four real-world datasets ranged from 0.18 to 0.65… recall dropped to 10% on web crawls and 38% on EHR notes, but with 2,000 labeled documents the fine-tuned OPF matched Tonic Textual at 0.95 F1.

Security Boulevard (Tonic syndication) securityboulevard.com

OPF maintained respectable precision of 0.77–0.85 but its conservative default operating point — designed to avoid over-redaction — leaves substantial PII exposed in noisy unstructured text.

Hacker News discussion news.ycombinator.com

A 4% failure rate is unacceptable for security-critical redaction… and a tool optimized to find and mask PII can just as easily be repurposed to efficiently extract sensitive data from large datasets.

MarkTechPost technical writeup marktechpost.com

Sparse MoE with 128 experts and top-4 routing per token, eight pre-norm transformer encoder blocks at width 640, grouped-query attention (14 query / 2 KV heads), and a constrained Viterbi decoder enforcing valid BIOES transitions.

OpenAI Privacy Filter Model Card cdn.openai.com

Japanese F1 88.1% (recall 86.6%, precision 89.7%) and Modern Standard Arabic F1 87.8% — performance significantly degrades on non-Latin characters and notations that differ from the training distribution.

Startup Fortune analysis startupfortune.com

A tactical move to capture the enterprise deployment layer and neutralize the data sovereignty objections that have slowed adoption of GPT-5.5 and other cloud-based frontier models in regulated industries.

Visual Studio Magazine — ‘Devs Sound Off on Usage-Based Copilot Pricing’ visualstudiomagazine.com

You will get less but pay the same price… each extended session carries a cost

GitHub gh-aw discussion #15139 github.com

The money-burning party is coming to a close… using Claude Opus now carries a 27x cost multiplier compared to previous tiers

Ed Zitron, Where’s Your Ed At — ‘Anthropic Removes Pro CC’ wheresyoured.at

Anthropic later admitted that its ‘Max’ plans were specifically designed to handle these ‘heavy’ usage patterns, which did not exist when the Pro tier was first conceived

GitHub Changelog — Changes to Copilot plans for individuals (Apr 20, 2026) github.blog

Until May 20, 2026, both monthly and annual users can cancel their subscriptions and receive a full refund for the time remaining on their current term

CIO.com — ‘The inference bill nobody budgeted for’ cio.com

Uber reportedly exhausted its entire annual AI budget by April due to deep integration of agentic workflows… AI labs are spending an estimated $1.35 for every $1 in revenue to subsidize compute

r/windsurf — ‘Windsurf is simply destroying its reputation’ reddit.com

costs for the same level of output exploded from a few dollars to nearly $80 in a single day due to the new overage calculations

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare