What Google, OpenAI, and Anthropic aren't saying out loud
A TPU redesign, a brittle privacy filter, and a reversed Claude Code price hike all hide the real economics under launch-day framing.
What Google, OpenAI, and Anthropic aren’t saying out loud
TL;DR
- Google’s 8th-gen TPU splits training and inference silicon, conceding per-socket parity with Nvidia Rubin and pitching systems-level wins instead.
- OpenAI’s open-weight PII redactor scores 96% F1 on the benchmark but drops to 10% recall on real web crawl data.
- Anthropic quietly pushed Claude Code from Pro to Max, then reverted; its own docs imply $20/month covers two days of heavy use.
- Mozilla credits Anthropic’s Mythos model with surfacing 271 zero-days in Firefox 150 ahead of release.
- GitHub pauses Copilot Individual signups and gates Claude Opus 4.7 behind a $39 Pro+ tier as agentic usage breaks per-request pricing.
Today’s three biggest stories share an awkward shape: a confident launch announcement, and a number underneath it that the announcement would rather you didn’t focus on. Google reframed its 8th-gen TPU as purpose-built for the “agentic era” — a framing that conveniently glosses a roughly 3:1 per-socket compute gap to Nvidia’s Rubin. OpenAI shipped an open-weight PII redactor with strong benchmark F1, and an independent eval immediately showed recall collapsing on the kind of messy data the model is meant to protect. Anthropic floated, then yanked, a quiet move of Claude Code out of the $20 Pro tier — and the postmortem showed the move wasn’t a mistake so much as an early read of broken unit economics on heavy users.
The briefs orbit the same axis. GitHub is rewriting Copilot pricing because agentic workflows blew through per-request assumptions. Shopify is buying its way past the same problem with an unlimited Opus token budget. The marketing layer and the cost layer are drifting apart, and today is one of the days you can see the gap.
Google splits the TPU in two — and quietly admits the per-chip race is lost
Source: google-ai-blog · published 2026-04-22
TL;DR
- Google’s 8th-gen TPU drops the unified design for separate training (8t) and inference (8i) silicon, pitched as “agentic era” hardware.
- Per-socket compute (~12.6 FP4 PFLOPs) trails Nvidia’s Rubin roughly 3:1; the wins Google claims are systems-level: 2.7x training price-performance, 80% better inference value vs Ironwood.
- Broadcom’s exclusivity ends — MediaTek co-designs the 8i — and the inference chip gets a new Dragonfly-style fabric and 384MB on-chip SRAM for KV cache.
- Critics flag two real costs: lost fungibility between training and serving, and a PyTorch/XLA stack still rough enough that the new TorchTPU backend reads as a do-over.
A specialization bet, not a benchmark win
The “two TPUs for the agentic era” framing is marketing wrapped around a more conservative engineering call. The 8t carries 12.5% more HBM than Ironwood but 11.5% less per-chip bandwidth — a deliberate yield and cost trade-off, not a performance ceiling push 1. A single 8t lands at roughly 12.6 FP4 PFLOPs, against Nvidia Vera Rubin’s ~35; Google does not contest the gap and instead leans on superpod economics, claiming 2.7x training price-performance and an 80% inference value-for-money lift on the 8i 2.
Quieter but more strategically interesting: Broadcom’s decade-long exclusivity on TPU co-design has ended, with MediaTek brought in for the 8i 1. That alone is a bigger supply-chain story than the chip names.
| TPU 8t (training) | TPU 8i (inference) | |
|---|---|---|
| Workload target | Pre-training, large pools of shared memory | Low-latency agentic decode |
| Memory move | +12.5% capacity, −11.5% bandwidth vs Ironwood 1 | 3x on-chip SRAM to 384MB for local KV cache 3 |
| Interconnect | 3D Torus (continued) | “Boardfly” Dragonfly-inspired, −56% network diameter 3 |
| Co-designer | Broadcom | MediaTek (new) 1 |
| Headline claim | 2.7x price-perf vs Ironwood 2 | 80% better inference value 2 |
The 8i swap is the more aggressive piece of engineering: out goes SparseCore, in comes a Collectives Acceleration Engine tuned for autoregressive decode, with a network topology designed to shorten the all-reduce paths that dominate token-by-token serving 3. These are the concrete agentic optimizations the launch post elides.
The fungibility problem nobody at Google wants to discuss
The sharpest critique is structural. Aragon Research argues that splitting training and inference forfeits the underappreciated moat of Nvidia’s general-purpose GPUs: a fleet of H- or B-series parts can be reallocated between pre-training and serving as demand swings, while committed 8t or 8i capacity risks stranding when the mix shifts 4. For Google’s own workloads — and Anthropic’s reported multi-gigawatt commitment — that’s tolerable. For Google Cloud renters with less predictable load, it’s a real lock-in to a workload profile.
Jensen Huang’s reply on Dwarkesh was unusually direct: he framed the Anthropic deal as “a unique instance, not a trend,” and publicly challenged Google to submit the 8-series to MLPerf Inference Max — a benchmark Google has not accepted 5. Independent benchmark verification is, for now, not on the table.
“No rival has so far accepted the public challenge.” — Jensen Huang 5
The software story is still the story
Practitioner reaction on HN treats the new native TorchTPU backend less as a feature and more as an admission that prior PyTorch/XLA paths had “undocumented behavior and bugs” too brittle for production 6. Realistic CUDA-to-TPU migrations are still pegged at 6–8 weeks for expert teams, and the recurring complaint isn’t the silicon — it’s GCP-side friction: quota provisioning, opaque errors on large pods 6.
Takeaway
The 8t/8i split is a credible cost-performance play anchored by one very large customer, not a Nvidia displacement event. Google is conceding the per-chip race and betting that full-stack control, cheaper silicon, and workload-specific architecture beat fungible general-purpose GPUs at hyperscaler economics. Whether that bet holds for Google Cloud’s smaller tenants — who can’t amortize a stranded-capacity risk across an Anthropic — is the open question the launch post carefully avoids.
Further reading
- Google unveils two new TPUs designed for the “agentic era” — ars-technica-ai
Anthropic’s stealth Claude Code price hike was a preview, not a mistake
Source: simon-willison · published 2026-04-22
TL;DR
- Anthropic silently moved Claude Code from the $20 Pro plan to $100+ Max tiers, then reverted within hours after backlash.
- Independent reporting reframes the “2% test” as broken unit economics: Anthropic may lose 200–3000% on heavy Pro users.
- Anthropic’s own docs now estimate $13/day for active developers — a $20/month plan covers about two days of real use.
- A parallel postmortem revealed silent quality regressions in the same window, sharpening the “stealth changes” pattern critics keep flagging.
A pricing “test” nobody was supposed to see
On April 22, Anthropic edited claude.com/pricing to make Claude Code exclusive to its $100/month Max 5x and $200/month Max 20x tiers — a 5x jump from the $20 Pro plan it had shipped with. There was no blog post, no email, no changelog. Head of Growth Amol Avasare eventually tweeted that this was “a small test on ~2% of new prosumer signups,” but the change was sitting on the public landing page and had already been snapshotted by the Internet Archive. Within hours, after Reddit, HN and Twitter caught fire, the checkbox was back. Avasare later conceded that updating the public docs was “a mistake” — though the underlying experiment apparently continues server-side.
Simon Willison, who has written 105 posts teaching Claude Code, framed it as a trust failure: he can’t build curricula for data journalists on a $1,200/year subscription, and a tweet from a growth lead is not how you announce a 5x price floor.
The economics the rollback didn’t fix
The more interesting read comes from outside Willison’s piece. The Register reported the move was driven by “untenable demand” as Anthropic rations compute 7. Ed Zitron went further, calling flat-rate agent subscriptions a “con” and estimating Anthropic loses 200–3000% on its most active Pro users 8. Days later, Anthropic itself quietly more than doubled its documented spend estimate for active developers from $6 to $13/day, with 90th-percentile users at $30/day 9. Do the math: at $13/day, a $20/month Pro plan is underwater after roughly 48 hours of real agentic coding.
Power users assume the vendor will keep tightening — and they’re building infrastructure to outlast the next squeeze.
Evidence: the openclaw-billing-proxy project on GitHub already injects Claude Code identifiers into third-party harness traffic so subscribers can route open-source agents through their plan credits 10. Circumvention tooling this fast is a tell about where users think this is heading.
Competitors pounce, and the stealth pattern hardens
OpenAI didn’t miss the opening. Codex engineering lead Thibault Sottiaux pledged Codex would stay on the Free and $20 Plus plans; Sam Altman publicly mocked Anthropic’s handling 11. The “transparent coding-agent vendor” positioning is now OpenAI’s to lose.
What makes the episode worse is what landed alongside it. On April 23, Anthropic published a postmortem admitting three regressions had degraded Claude Code between March and April — including a system prompt capping responses at 25 words that “tanked coding quality” on Opus 4.6 and 4.7 12. So in the same window users were asked to swallow a 5x price floor, the product had silently gotten worse via undisclosed changes. Silent pricing edits, silent prompt changes, silent reasoning-effort downgrades — the through-line is the same.
What’s actually at stake
The reversal bought Anthropic a news cycle, not a fix. Flat-rate agentic coding looks economically dead at current model costs, and the vendors who survive will be the ones that ship versioned, announced pricing — not the ones that A/B test rug-pulls and apologize when Hacker News notices. Until Anthropic commits in writing to keeping Claude Code on the $20 tier, the rational developer bet is to hedge toward Codex.
Further reading
- Anthropic tested removing Claude Code from the Pro plan — ars-technica-ai
OpenAI’s Privacy Filter ships strong on paper, fragile in the wild
Source: openai-blog · published 2026-04-22
TL;DR
- OpenAI released an Apache 2.0, 1.5B-parameter PII redactor claiming 96–97% F1 on PII-Masking-300k.
- Tonic.ai’s independent eval shows default recall collapsing to 10% on web crawl and 38% on EHR notes.
- Open weights make the model symmetric: strip the redaction step and it’s a frontier-grade PII extractor.
- Treat the headline F1 as a synthetic-data ceiling; production needs regex for secrets and a rehydration layer for context.
The headline number doesn’t survive real corpora
OpenAI’s pitch for the Privacy Filter (OPF) is a single bidirectional token classifier, 1.5B params with 50M active, 128K context, decoded with constrained Viterbi over BIOES spans — and 97.43% F1 on a corrected PII-Masking-300k. That benchmark is largely synthetic. Tonic.ai ran the same-week comparison on real data and the gap is brutal:
| Corpus | OPF default recall | Tonic redactor F1 |
|---|---|---|
| PII-Masking-300k (synthetic) | ~98% | — |
| Web-crawl text | 10% | 0.92–0.99 |
| Electronic health records | 38% | 0.92–0.99 |
Tonic attributes the collapse to OPF’s precision-tuned default — the model “stays quiet when unsure” to preserve downstream utility, but in noisy real domains that silence leaks names, addresses, and medical IDs 13. OpenAI does flag that operating points are tunable and that fine-tuning lifts domain F1 from ~54% to ~96%, but the out-of-the-box number is not what ships in most pipelines.
A separate comparison against Microsoft Presidio and Piiranha makes the structural point: deterministic regex still wins on SSNs, credit cards, and other structured identifiers, and transformer-only redactors generalize poorly outside their training distribution 14. OPF belongs next to a regex layer, not in place of one.
Two problems the announcement elides
The secret category — passwords, API keys, bearer tokens — is the most obvious mismatch. These are deterministic strings where “nearly correct” is operationally useless; a probabilistic span classifier is the wrong tool, and HN practitioners flagged this within hours 15.
The deeper issue is dual-use. A high-recall PII detector with open weights is trivially invertible:
flowchart LR
A[Unstructured corpus<br/>breach dump, scrape] --> B[OPF weights]
B -->|redaction mode| C[Sanitized text]
B -->|inverted| D[Extracted PII<br/>by OpenAI taxonomy]
The same model that masks private_person and account_number will, with the output flipped, surface them — letting an attacker harvest identifiers from a breach dump using OpenAI’s own eight-category schema 15. The UK AI Security Institute’s recent Boundary Point Jailbreaking work compounds this: single-bit blocked/unblocked feedback was enough to defeat OpenAI’s frontier input classifiers, the same architectural family OPF sits in 16. Once weights are public, adaptive probing is cheap.
The workflow tax nobody benchmarks
Redaction is only half the loop. If you sanitize a prompt to [PERSON_1] lives at [ADDRESS_2] before sending it to GPT-5, the downstream model loses the grounding it needs to reason — and the client has to maintain a de-sanitization map to rehydrate placeholders in the response 17. That plumbing lives in your application, not OpenAI’s release, and it degrades answer quality on any task that turns on named entities.
What actually shipped
Strategically, the Apache 2.0 license is the story: boosters frame it as “killing the last excuse” against enterprise GPT-5 adoption in healthcare and legal 18. Technically, the consensus is narrower — OPF is a solid base for unstructured prose that needs domain fine-tuning, a regex layer for deterministic secrets, and rehydration logic before it earns production trust. The 97.43% is a ceiling on synthetic data, not a floor on yours.
Round-ups
Mozilla: Anthropic’s Mythos found 271 security vulnerabilities in Firefox 150
Source: ars-technica-ai
Mozilla used Anthropic’s new Mythos security model to audit Firefox 150, surfacing 271 zero-day vulnerabilities ahead of release. Mozilla’s CTO calls the model “every bit as capable” as top human security researchers, marking one of the largest disclosed AI-driven vulnerability hauls in a shipping browser.
Changes to GitHub Copilot Individual plans
Source: simon-willison
GitHub tightens Copilot Individual plans, pausing new signups, restricting Claude Opus 4.7 to the $39/month Pro+ tier, and dropping older Opus models, citing agentic workflows that consume far more compute than the original per-request pricing assumed. New token-based session and weekly limits replace the request-count model.
Shopify’s AI Phase Transition: 2026 Usage Explosion, Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym — with Mikhail Parakhin, Shopify CTO
Source: latent-space
Shopify CTO Mikhail Parakhin details the company’s 2026 AI adoption surge in a Latent Space interview, including an unlimited Claude Opus 4.6 token budget for engineers and internal tools Tangle, Tangent, and SimGym used to scale AI across the merchant platform.
Introducing workspace agents in ChatGPT
Source: openai-blog
OpenAI launches workspace agents in ChatGPT, Codex-powered cloud agents that execute multi-step workflows across connected enterprise tools. Pitched at teams rather than individual developers, the agents run asynchronously in sandboxed cloud environments and inherit organization-level permissions for secure access to company data.
ChatGPT Images 2.0 / GPT-Image-2 launch
Source: openai-blog
OpenAI ships its second-generation image model under two names — ChatGPT Images 2.0 in the consumer app and GPT-Image-2 via API — touting improved text rendering, multilingual prompts, and visual reasoning. Latent Space’s coverage frames it as OpenAI’s response to recent Google and Black Forest Labs image releases.
Further reading:
- [AINews] OpenAI launches GPT-Image-2 — latent-space
GPT-5.5 Bio Bug Bounty
Source: openai-blog
OpenAI opens a red-teaming challenge against GPT-5.5’s bio-risk safeguards, paying up to $25,000 for universal jailbreaks that bypass biosecurity refusals. The bounty narrows scope to bio-specific harms rather than general jailbreaks, marking OpenAI’s first domain-targeted bounty tied to a frontier model release.
Making ChatGPT better for clinicians
Source: openai-blog
OpenAI offers ChatGPT for Clinicians free to verified U.S. physicians, nurse practitioners, and pharmacists, with HIPAA-aligned handling for clinical documentation, decision support, and research workflows. Verification gates access, distinguishing the tier from consumer ChatGPT and from OpenAI’s enterprise health pilots with hospital systems.
Footnotes
-
Nand Research analysis — https://nand-research.com/google-cloud-8th-generation-tpu-family-splits-training-and-inference/
↩ ↩2 ↩3 ↩4TPU 8t increases memory capacity by 12.5% over Ironwood, [but] its memory bandwidth per chip is actually 11.5% lower, a deliberate trade-off likely aimed at improving silicon yield and reducing costs… the 8th generation marks the end of Broadcom’s exclusivity, with MediaTek joining as a co-designer for the inference-focused 8i.
-
Tom’s Hardware — https://www.tomshardware.com/tech-industry/semiconductors/google-splits-its-tpu-into-two-chips-for-the-first-time-with-training-and-inference-variants
↩ ↩2 ↩3A single TPU 8t chip delivers approximately 12.6 PFlops (FP4)—trailing Nvidia’s R200 in pure compute… Google claims a 2.7x improvement in price-performance over Ironwood and an 80% boost in inference value-for-money on the 8i.
-
Futurum Group — https://futurumgroup.com/insights/google-splits-its-tpu-line-to-enter-the-era-of-agentic-silicon/
↩ ↩2 ↩3The TPU 8i abandons the longstanding 3D Torus interconnect for ‘Boardfly,’ a Dragonfly-inspired hierarchical topology that reduces network diameter by 56%, paired with a Collectives Acceleration Engine replacing SparseCore to speed autoregressive decoding… triples on-chip SRAM to 384 MB to hold KV cache locally.
-
Aragon Research — https://aragonresearch.com/google-dual-tpus-prepare-for-inference-agents/
↩Unlike Nvidia’s general-purpose GPUs, which can be dynamically reallocated from training to inference based on demand, Google’s split forces providers to commit to a specific workload profile, potentially leading to stranded capacity if market needs shift.
-
Pasquale Pillitteri (Jensen Huang on Dwarkesh Podcast) — https://pasqualepillitteri.it/en/news/1441/nvidia-vs-google-tpu-anthropic-ai-chip-2026
↩ ↩2Huang dismissed the narrative of a ‘customer defection,’ specifically labeling Anthropic’s massive 3.5 GW commitment to Google silicon as a ‘unique instance, not a trend’… and challenged Google to submit the TPU 8 series to public benchmarks like MLPerf Inference Max, stating that ‘no rival has so far accepted the public challenge’.
-
Hacker News / Moomoo developer thread — https://www.moomoo.com/403
↩ ↩2Veteran developers recall ‘undocumented behavior and bugs’ in older PyTorch/XLA implementations, with some viewing the new native TorchTPU backend as a necessary admission that previous migration paths were too brittle for production… the real complaint is not the silicon but ‘GCP friction’—quota provisioning and generic error messages on large pods.
-
The Register — https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/
↩the move was driven by ‘untenable demand,’ as Anthropic sought new methods to ration its services amid tightening compute capacity
-
Ed Zitron, Where’s Your Ed At — https://www.wheresyoured.at/news-anthropic-removes-pro-cc/
↩Anthropic and OpenAI ‘conned’ users by offering flat-rate subscriptions that were mathematically impossible to sustain… Anthropic may be losing 200% to 3000% on its most active power users.
-
India Today (citing Anthropic’s revised docs) — https://www.indiatoday.in/technology/news/story/ai-costs-more-than-humans-companies-spending-big-on-models-over-employee-salaries-2902204-2026-04-27
↩Anthropic more than doubled its estimated spending for active developers from $6 to $13 per day, with the 90th percentile of heavy users projected to spend $30 per day
-
GitHub: zacdcook/openclaw-billing-proxy — https://github.com/zacdcook/openclaw-billing-proxy
↩the community developed ‘openclaw-billing-proxy’ on GitHub, a working tool that uses 7-layer bidirectional processing to inject Claude Code identifiers into requests, attempting to bypass Anthropic’s detection and maintain subscription-based billing
-
Business Insider — https://www.businessinsider.com/anthropic-claude-code-price-confusion-sam-altman-2026-4
↩CEO Sam Altman publicly mocked Anthropic’s move, emphasizing that OpenAI’s coding tools would remain accessible on lower-tier plans.
-
↩Anthropic identified three specific errors… a system prompt instruction designed to curb verbosity—capping responses to 25 words—tanked coding quality across Opus 4.6 and 4.7
-
Tonic.ai benchmark report — https://www.tonic.ai/blog/benchmarking-openai-privacy-filter-pii-detection
↩OPF’s default recall plummeted to just 10% for web-crawl data and 38% for Electronic Health Record (EHR) notes, while Tonic’s own production redactor maintained F1 scores between 0.92 and 0.99 on the same data.
-
Grepture comparison (Presidio/Piiranha/OPF) — https://grepture.com/blog/best-open-source-models-pii-redaction
↩Microsoft Presidio still leads in structured PII (like SSNs and credit cards) due to its regex-based recognizers, whereas transformer-only models like Piiranha often fail to generalize beyond their synthetic training distributions.
-
Hacker News discussion — https://news.ycombinator.com/item?id=44577019
↩ ↩2Because the model is probabilistic, it is unsuitable for deterministic secrets like passwords… a high-accuracy PII redactor can be trivially inverted to act as a PII extractor, allowing attackers to find sensitive data in unstructured text with minimal effort using the model’s own taxonomy.
-
UK AISI — Boundary Point Jailbreaking post — https://www.aisi.gov.uk/blog/boundary-point-jailbreaking-a-new-way-to-break-the-strongest-ai-defences
↩BPJ was the first automated method to successfully defeat OpenAI’s input classifiers for frontier models without requiring human-led seed attacks, using single-bit blocked/unblocked feedback to iteratively bypass classifiers.
-
AI Engineer Weekly (Substack) — ‘Privacy Filter is not an LLM’ — https://aiengineerweekly.substack.com/p/privacy-filter-is-not-an-llm
↩Redacted text is often practically useless for complex tasks, as LLMs struggle to reason without the context provided by specific names or entities; de-sanitization mapping must happen client-side after the LLM responds.
-
StartupFortune — ‘Kills the last excuse against enterprise AI’ — https://startupfortune.com/openais-open-weight-privacy-filter-kills-the-last-excuse-against-enterprise-ai/
↩By providing the compliance infrastructure needed to bypass data-sharing objections in healthcare and legal sectors, OpenAI effectively ‘kills the last excuse’ against adopting frontier models like GPT-5.