JS Wei (Jack) Sun

Anthropic's Mega-Deal Day, and the Tiers Beneath It

Anthropic's $125B AWS pact and a wobbling Opus 4.7 release expose a frontier increasingly partitioned between partners, paying customers, and press.

Anthropic’s Mega-Deal Day, and the Tiers Beneath It

TL;DR

  • Amazon commits up to $25B more to Anthropic; Anthropic commits $100B back to AWS over a decade in a circular financing structure.
  • Opus 4.7’s new tokenizer inflates code and JSON token counts up to 1.35x — an effective 35% price hike on developer workloads.
  • Anthropic’s actual frontier model, Claude Mythos, is gated to roughly 50 partners; Opus 4.7 is the deliberately capped public tier.
  • DeepMind’s Gemini Robotics-ER 1.6 jumps gauge reading from 23% to 98% — but only on internal evaluations.
  • Anthropic Labs ships Claude Design and Hyatt rolls out ChatGPT Enterprise as the OpenClaw product divergence continues.

Today is an Anthropic day, and the through-line cuts deeper than any single announcement. The $125 billion AWS pact, the public Opus 4.7 release, and even DeepMind’s robotics breakthrough all share a structural tell: the version on the press release isn’t quite the version that ships. Amazon’s $25B in fresh capital flows back through a $100B AWS commitment that critics are calling vendor financing, while Anthropic’s “primary training partner” framing sits next to a separate million-TPU bet on Google. Opus 4.7 was sold as strictly better at the same price, but a quiet tokenizer change and a reverted system prompt have already chipped at that claim — and the genuinely frontier model, Claude Mythos, is reserved for fifty partners. DeepMind’s 98% gauge-reading number, meanwhile, lives entirely inside its own lab. The frontier is increasingly tiered: one layer for partners, one for paying customers, one for the headline. Today’s coverage is about reading the gap.

Anthropic’s 5GW AWS deal: historic scale, circular money, hedged silicon

Source: anthropic-news · published 2026-04-20

TL;DR

  • Amazon commits up to $25B more to Anthropic; Anthropic commits $100B back to AWS over a decade — critics call it vendor financing.
  • Trainium2 looks great in Anthropic’s hands but Cohere, Stability AI, and OpenAI have all judged it uncompetitive with Nvidia H100s.
  • “Primary training partner” framing sits awkwardly next to Anthropic’s separate million-TPU commitment to Google.
  • The 5GW is partly about frontier training, partly about paying down reliability debt from a 3x revenue surge.

The headline and the structure underneath

Amazon is putting another $5B into Anthropic immediately, with up to $20B more to follow, while Anthropic commits over $100B to AWS infrastructure over the next decade and secures up to 5 gigawatts of new capacity — roughly 1GW of it on Trainium2/Trainium3 by end of 2026. Run-rate revenue, Anthropic says, has climbed from $9B at the end of 2025 to over $30B.

That’s the pitch. The structure underneath is more interesting. Forbes’ Jon Markman calls the arrangement “vendor financing disguised as venture capital” — Amazon’s investment dollars round-trip back as AWS cloud revenue, inflating both companies’ growth optics 1. A separate Forbes piece flags an accounting wrinkle worth understanding before the IPO conversation gets serious: Anthropic books cloud-partnership revenue gross, including the share AWS and Google take, while OpenAI books Azure revenue net 2. The $30B run-rate is not apples-to-apples with the OpenAI numbers it’s implicitly being compared to.

flowchart LR
    A[Amazon] -->|$5B now, up to $25B total| B[Anthropic]
    B -->|$100B over 10 years| C[AWS compute]
    C -->|gross revenue| B
    B -.->|booked as run-rate| D[Investor narrative]
    A -->|owns| C

Trainium works great — for Anthropic

SemiAnalysis describes Project Rainier as deep hardware-software co-design, with Anthropic effectively using Amazon’s Annapurna Labs as a captive silicon shop tuned for its memory-bandwidth-bound RL workloads 3. That’s why the chips look strong in Anthropic’s benchmarks. It’s also why they look weak elsewhere.

Business Insider obtained internal AWS documents showing other startups, including Cohere and Stability AI, found Trainium2 “less competitive” than Nvidia’s H100, citing higher latency and frequent service disruptions 4. OpenAI’s own $38B AWS deal pointedly avoided Trainium in favor of pure Nvidia clusters. The Neuron SDK remains a real software tax for anyone outside the Anthropic-AWS co-design loop.

The “primary partner” Anthropic is hedging against

Anthropic calls AWS its primary training partner in this announcement. In the same window, it has committed to deploying up to one million Google TPUs (Ironwood) — the largest external TPU deployment in Google’s history 5. The plausible reconciliation is a workload split, with training-critical RL on Trainium/Rainier and large chunks of inference and research on TPU. The less flattering reading: Anthropic is hedging hard against Trainium execution risk while still signing the $100B AWS commitment that justifies Amazon’s check.

Reliability debt, not just frontier ambition

The “addressing operational strain” framing in the announcement deserves to be taken at face value. VentureBeat documented that Anthropic quietly downgraded Claude Code’s default reasoning effort from High to Medium to control costs, while a prompt-caching bug inflated some users’ bills by 10–20x; $100/month Max subscribers were exhausting peak-hour limits in 15 minutes 6.

The 5GW is as much about paying down accumulated reliability debt from a roughly 3x revenue surge in a single year as it is about training the next frontier model.

The vendor narrative — historic scale, deep silicon partnership, $30B run-rate — is real. It is also materially incomplete in four specific ways, and the next earnings cycle is when the gross-vs-net accounting question stops being a footnote.


Claude Opus 4.7: the “strictly better” release that wasn’t

Source: anthropic-news · published 2026-04-16

TL;DR

  • Anthropic and Latent Space both framed Opus 4.7 as uniformly better than 4.6 at the same price — within a week that story partially collapsed.
  • A new tokenizer quietly inflates code/SQL/JSON token counts up to 1.35×, an effective ~35% price hike on developer workloads.
  • Anthropic reverted a verbosity-limiting system prompt on April 20 after admitting it “hurt coding quality.”
  • The genuinely frontier model, Claude Mythos, is gated to ~50 partners — Opus 4.7 is a deliberately capped public tier.

A tidy launch narrative

The April 16 drop landed with two reinforcing posts: Anthropic’s own announcement and a same-week Latent Space AINews recap whose headline read “literally one step better than 4.6 in every dimension.” The pitch: a new xhigh effort tier between high and max, tripled image resolution (up to ~3.75 MP), one-third fewer tool errors, “loop resistance,” and identical $5/$25-per-million pricing. Internal benchmarks back it up — CursorBench jumps from 58% to 70%, XBOW Visual Acuity from 54.5% to 98.5%, and Rakuten reports Opus 4.7 autonomously closing 13 production issues in a day with time-to-market collapsing from 24 working days to 5 7.

That’s the story. Then the week happened.

The silent price hike

“Pricing unchanged” is technically true and substantively misleading. Opus 4.7 ships with a new tokenizer that produces 1.0–1.35× more tokens for the same raw text — and the high end of that range hits exactly the workloads developers actually buy Opus for: code, SQL, and JSON. Finout characterizes the net effect as a silent ~35% price increase wearing an unchanged rate card 8.

Compounding this, the API now rejects non-default temperature, top_p, and top_k with a 400 error. Practitioners read the forced temperature=1.0 as a “distillation defense” to poison outputs against rival labs cloning Claude’s reasoning 9 — useful for Anthropic’s moat, hostile to anyone who needs deterministic generation in production. Neither change is foregrounded in the launch post.

Anthropic’s own post-mortem

The cleanest puncture came from Anthropic itself. On April 23 the company published a post-mortem conceding that an experimental system prompt shipped alongside Opus 4.7 — capping inter-tool text to 25 words and final responses to 100 — “hurt coding quality” and was fully reverted on April 20 10. Combined with an earlier default-effort drop and a March thinking-history caching bug, the user complaints about “lazy” Opus weren’t vibes; they were a real regression patched four days after launch.

Independent testing turned up worse. One benchmark pass clocked MRCR v2 long-context recall falling from 78.3% on Opus 4.6 to 32.2% on 4.7 11 — consistent with a tokenizer that inflates token counts and therefore shrinks the usable portion of a nominally unchanged context window.

“Leaderboard theater next to real-world flakiness.”

The Mythos overhang

The most underplayed signal is what didn’t ship. Help Net Security reports that Claude Mythos Preview — capable of autonomously chaining zero-day exploits — was withheld from public release and gated to roughly 50 Project Glasswing partners including Microsoft, Apple, and AWS, plus a vetted Cyber Verification Program for security researchers 12. Opus 4.7 is, by design, the capability-capped tier sitting beneath it.

What the cluster actually says

Read together, the launch post and the AINews recap describe a refinement release. Read with the week’s follow-ups, the same event looks different: a tokenizer-driven cost increase, an API lockdown justified as anti-distillation, a self-admitted quality regression hot-patched on day four, a long-context regression nobody at Anthropic has addressed, and a two-tier ecosystem where the actually-frontier model lives behind a club door. “Strictly better at the same price” survives as marketing. As a description of what changed on April 16, it doesn’t.

Further reading


Gemini Robotics-ER 1.6 hits 98% on gauges — in DeepMind’s own lab

Source: deepmind-blog · published 2026-04-13

TL;DR

  • DeepMind’s new embodied-reasoning model jumps analog gauge reading from 23% to 98% — but only on internal evals.
  • “Agentic vision” works by zooming, pointing at tick marks, then running Python to compute the value.
  • ER 1.6 is a cloud reasoning layer; on-device rivals like Figure’s Helix are solving an adjacent latency problem.
  • ASIMOV safety gains are real on paper but inherit the old critique that natural-language laws don’t ground in physics.

The 23% → 98% claim deserves an asterisk

The headline result of Gemini Robotics-ER 1.6 is a Boston Dynamics Spot reading analog instruments at a 98% success rate, up from 23% under ER 1.5 13. The mechanism is what DeepMind calls “agentic vision”: the model zooms into a gauge image, uses its improved pointing to mark needle and tick locations, then writes and executes Python to compute the reading.

That 4× jump comes entirely from DeepMind’s internal benchmark. Independent commentary has noted the test set likely under-represents the reflective glass, oblique angles, and shifting shadows that define real refinery and substation environments — and that a separate “93%” figure in Google’s own materials suggests the headline number describes a specific favorable subset 14. Until a third party reproduces it on plant-floor footage, treat 98% as a ceiling, not a deployment expectation.

Cloud brain, someone else’s hands

ER 1.6 is explicitly a high-level reasoner. It calls tools — Search, custom code, vision-language-action (VLA) models — and delegates motor control elsewhere. That architecture is worth comparing to what humanoid-focused labs are shipping:

StackPlannerControllerWhere it runs
Gemini Robotics-ER 1.6Gemini ER (cloud)External VLACloud round-trip
Figure Helix 157B System-280M System-1 @ 200 HzOn-device Jetson Orin
flowchart LR
    Cam[Overhead + wrist cameras] --> ER[Gemini ER 1.6 cloud]
    ER -->|points, code, plans| VLA[VLA / motor policy]
    ER -->|tool calls| Tools[Search · Python · custom code]
    VLA --> Robot[Spot / arm]
    Robot --> Cam

The cost of that round-trip is the open question. HN discussion flagged Python-synthesis loops as “currently quite slow,” and corrected an early claim of a 2.5kW chip footprint — that figure was for a full server rack, not a single accelerator 16. Commenters made the obvious point: a $50 digital sensor reads a pressure gauge for free, forever. The case for an LLM-in-the-loop has to clear that bar on tasks where the value is genuinely in reasoning about what the reading means, not just transcribing it.

The demos and the pitch are pulling apart

Boston Dynamics’ viral reel shows Spot reading a chalkboard to-do list, sorting shoes, checking mousetraps, and taking a real dog for a walk and fetch 17. It’s charming and oblique to the industrial-inspection use case the model is actually shipped for. A 30 kg quadruped improvising around a household pet is a different safety regime than monitoring a thermometer in a fenced facility.

DeepMind’s safety story leans on the ASIMOV benchmark — 10% better on video hazard detection vs. Gemini 3.0 Flash. The benchmark is methodologically substantial (2.9M instruction-evaluation pairs, NEISS injury data), but it inherits old criticism: Rodney Brooks and others have long argued Asimov-style natural-language laws are unimplementable as engineering constraints 18.

Models can recite a safety principle while failing to ground it in payload limits or gripper geometry.

That “embodiment gap” is acknowledged by the ASIMOV authors themselves. ER 1.6 is a credible jump for the cloud-reasoning layer of a robotics stack. The unresolved questions — replication, latency economics, and whether language-grounded safety is the right abstraction — are the ones that will decide whether it leaves the demo reel.

Round-ups

[AINews] The Two Sides of OpenClaw

Source: latent-space

Latent Space’s AI News uses a slow news day to reflect on ‘OpenClaw’ — the week’s running theme contrasting OpenAI and Anthropic’s diverging product strategies — rather than recapping a single launch.

Introducing Claude Design by Anthropic Labs

Source: anthropic-news

Anthropic Labs debuts Claude Design, a new offering from its experimental product group that applies Claude to design workflows. The launch marks another vertical-specific Claude variant from the Labs team, following recent specialized rollouts aimed at expanding Claude’s footprint beyond general chat.

OpenAI helps Hyatt advance AI among colleagues

Source: openai-blog

Hyatt rolls out ChatGPT Enterprise to its global workforce, using GPT-5.4 and Codex to support productivity, internal operations, and guest experience workflows across the hotel chain — OpenAI’s latest named hospitality deployment.

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Source: deepmind-blog

Google DeepMind’s Gemini 3.1 Flash TTS adds granular inline audio tags that let developers direct tone, emotion, and delivery within prompts, giving finer control over expressive speech generation than the prior Flash TTS release.

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

Source: import-ai

Jack Clark’s Import AI 453 covers new red-teaming work on breaking AI agents, a project called MirrorCode, and a roundup of ten perspectives on the ‘gradual disempowerment’ thesis about humans ceding control to AI systems.

My bets on open models, mid-2026

Source: interconnects

Nathan Lambert lays out his mid-2026 predictions for open-weights models, focusing on whether the gap to frontier closed models is widening or narrowing and where Chinese labs, Meta, and smaller open players are likely to push next.

What I’ve been building: ATOM Report, post-training course, finishing my book, and ongoing research

Source: interconnects

Nathan Lambert recaps his current projects: the ATOM Report on open models, a post-training course, the final stretch of his book, and ongoing research threads at Interconnects and Ai2.

Footnotes

  1. Forbes (Markman)https://www.forbes.com/sites/jonmarkman/2026/04/22/amazon-33-billion-anthropic-deal-and-the-limits-of-ai-infrastructure/

    Amazon’s investment is contingent on Anthropic’s commitment to spend $100 billion on AWS infrastructure over the next decade… functions as vendor financing disguised as venture capital

  2. Forbes (Majic)https://www.forbes.com/sites/josipamajic/2026/03/25/openai-and-anthropic-count-revenue-differently-and-investors-are-looking-into-it/

    Anthropic recognizes revenue from cloud partnerships on a gross basis, including the share taken by cloud providers like AWS and Google before expenses are deducted… OpenAI reports its Microsoft Azure revenue on a net basis

  3. SemiAnalysishttps://newsletter.semianalysis.com/p/amazons-ai-resurgence-aws-anthropics-multi-gigawatt-trainium-expansion

    Anthropic benefits from deep hardware-software co-design, essentially using Amazon’s Annapurna Labs as a custom silicon shop… benchmarks are cherry-picked for Anthropic’s specific memory-bandwidth-bound workloads

  4. Business Insiderhttps://www.businessinsider.com/startups-amazon-ai-chips-less-competitive-nvidia-gpus-trainium-aws-2025-11

    Internal documents… revealed that other startups, including Cohere and Stability AI, found Trainium2 ‘less competitive’ than Nvidia’s H100, citing higher latency and frequent service disruptions

  5. Medium (Estari)https://medium.com/@UdaykiranEstari/googles-40b-anthropic-bet-the-deal-that-makes-no-sense-until-you-see-what-s-really-happening-9e0e71202caf

    Anthropic’s agreement to deploy up to one million Google Tensor Processing Units… represents the largest external deployment of TPUs in Google’s history, effectively turning Google into a merchant silicon provider for its own primary model competitor

  6. VentureBeathttps://venturebeat.com/technology/is-anthropic-nerfing-claude-users-increasingly-report-performance

    Anthropic executives later confirmed they had quietly changed the default reasoning ‘effort’ level from ‘High’ to ‘Medium’ to improve speed and reduce token costs… bugs in the prompt-caching system that reportedly inflated user costs by 10–20x

  7. Rakuten Today — production deployment accounthttps://rakuten.today/blog/rakuten-accelerates-development-with-claude-code%EF%BF%BC.html

    Rakuten reports Opus 4.7 autonomously closed 13 production issues and routed 12 more in a single day, with engineering time-to-market dropping from 24 working days to 5.

  8. Finout — ‘The real cost story behind the unchanged price tag’https://www.finout.io/blog/claude-opus-4.7-pricing-the-real-cost-story-behind-the-unchanged-price-tag

    The new tokenizer produces 1.0–1.35x more tokens for the same raw text, amounting to an effectively silent ~35% price increase on code, SQL and JSON workloads despite the unchanged $5/$25 rate card.

  9. laozhang.ai — Claude Code top_p deprecatedhttps://blog.laozhang.ai/en/posts/claude-code-top-p-deprecated

    Setting temperature, top_p or top_k to non-default values now returns a 400 error; analysts read the forced temperature=1.0 as a ‘distillation defense’ to poison outputs against rival labs distilling Claude.

  10. Anthropic engineering post-mortem (April 23, 2026)https://www.anthropic.com/engineering/april-23-postmortem

    An experimental system prompt instructing the model to keep text between tool calls to ≤25 words and final responses to ≤100 words ‘hurt coding quality’ and was fully reverted on April 20, 2026.

  11. Medium — ‘Claude Opus 4.7 is bullshit’ (Mehulgupta)https://medium.com/data-science-in-your-pocket/claude-opus-4-7-is-bullshit-67ed884e2feb

    Independent testers report a regression on MRCR v2 long-context recall from 78.3% (Opus 4.6) to 32.2% (Opus 4.7), and call Anthropic’s headline benchmark gains ‘leaderboard theater’ next to real-world flakiness.

  12. Help Net Security — Claude Opus 4.7 releasedhttps://www.helpnetsecurity.com/2026/04/16/claude-opus-4-7-released/

    Anthropic withheld the more capable ‘Claude Mythos Preview’ from public release because it can autonomously chain zero-day exploits, restricting it to ~50 Project Glasswing partners (Microsoft, Apple, AWS) — a ‘club-based’ access model critics call structurally inequitable.

  13. MLQ.aihttps://mlq.ai/news/boston-dynamics-enhances-spot-robot-dog-with-google-gemini-ai-for-gauge-reading/

    Boston Dynamics’ Spot robot, equipped with Gemini Robotics-ER 1.6, achieved a 98% success rate reading analog instruments, up from 23% with version 1.5.

  14. Medium (creed_1732) — ‘98% Accuracy: What It Actually Means’https://medium.com/@creed_1732/boston-dynamics-spot-analog-gauge-reading-with-gemini-ai-what-98-accuracy-actually-means-for-c6ad4f07d861

    The staggering jump from 23% to 98% comes entirely from internal DeepMind benchmarks; reflective glass, shadows and oblique viewing angles in real plants are not represented in the eval set.

  15. Humanoid Foundation Model Report 2025https://foundationmodelreport.ai/2025.pdf

    Figure’s Helix uses a 7B System-2 model paired with an 80M System-1 controller running at 200 Hz on embedded Jetson Orin GPUs — a fully on-device stack contrasting with Gemini ER’s cloud-dependent reasoning layer.

  16. BigGo finance summary of HN discussionhttps://finance.biggo.com/news/FlQ3lZ0BrdTHlKtC7wW1

    Commenters describe Python-synthesis loops as ‘currently quite slow,’ and an initial claim of a 2.5kW chip was corrected to apply to the entire server-rack deployment, not a single accelerator.

  17. eWeekhttps://www.eweek.com/news/google-boston-dynamics-robot-dog-gemini-ai/

    In a flagship demo, Spot read a handwritten to-do list off a chalkboard, sorted shoes, checked mousetraps, and even took a real dog on a leash for a walk and a game of fetch.

  18. Kasenberg, AI Ethics & Asimov (course site)https://dkasenberg.github.io/ai-ethics-asimov/

    Roboticists like Rodney Brooks have long dismissed Asimov’s laws as ‘bullshit’ because they rely on natural-language ambiguity that defies strict logical implementation.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare