Compute, capital, and access: the real shape of the AI frontier today

TL;DR

GPT-5.5 leads agentic-coding benchmarks but the UK AISI was denied access to verify a critical jailbreak patch.
The real GPT-5.5 story is the Superapp: ChatGPT, Codex, and Atlas merged into one cockpit at roughly 2× pricing.
DeepSeek V4 cuts inference FLOPs to ~27% of V3.2 and was natively tuned for Huawei Ascend, sidestepping US chip controls.
Google’s $40B Anthropic deal is $10B cash plus 5GW of Cloud capacity, tethering the lab to two rival silicon stacks.
US weighs sanctions over alleged Chinese model distillation; Anthropic and OpenAI claim DeepSeek harvested 16M Claude exchanges.

Three headline moves today — a model launch, a model launch, and a megadeal — and all three turn out to be stories about who controls the compute, the capital, and the right to verify.

GPT-5.5 takes the agentic-coding crown, but the real product is a bundled Superapp at double the price, and the UK’s safety institute was locked out of confirming a critical patch. DeepSeek V4’s headline is benchmarks; its substance is a hybrid-attention architecture tuned natively for Huawei silicon, a deliberate route around the US chip perimeter. Google’s $40B Anthropic package is mostly milestones and 5 gigawatts of Cloud capacity — a compute-for-equity barter that lands four days after Amazon’s, and that the DOJ may try to unwind within six months.

The roundups echo the frame: alleged Chinese distillation of frontier models, Samsung executives priced out of their own memory by datacenter demand, and a Verge essay arguing the public never asked for any of this.

GPT-5.5 lands as the agentic-coding leader — and a jagged one

Source: openai-blog · published 2026-04-23

TL;DR

GPT-5.5 takes the top spot on terminal and computer-use benchmarks, but loses ground to Claude Opus 4.7 on multi-file SWE-bench Pro.
Independent testers report a LiveBench regression and worse “BullshitBench” pushback as reasoning effort goes up.
Cyber capability is a real step-change (XBOW miss rate 40% → 10%), but UK AISI was denied access to verify patches on a critical jailbreak.
The actual story is the Superapp: ChatGPT + Codex + Atlas collapsed into one cockpit, at roughly 2× GPT-5.4 pricing.

A leader on the benchmarks OpenAI picked

GPT-5.5 ships with the numbers OpenAI wanted in the headline: 82.7% on Terminal-Bench 2.0, 78.7% on OSWorld-Verified, 35.4% on FrontierMath Tier 4, and a “High” Preparedness rating in cyber and bio. Independent reviewers don’t seriously contest the agentic-coding lead — XBOW’s red-team work pegs GPT-5.5’s vulnerability miss rate at 10%, down from 40% on the original GPT-5, which MindFort calls a genuine “step change” ¹.

The trouble starts on benchmarks the launch post doesn’t mention. MindStudio’s evaluation has GPT-5.5 (High Effort) at 56.67 on LiveBench, a regression from GPT-5.4’s 70.00, and an 86% hallucination rate on Omniscience ². The Decoder summarizes the trade bluntly: better headline scores, more hallucinations, ~20% higher API cost ³. On harder repository reasoning, Claude Opus 4.7 still leads:

Model	Terminal-Bench 2.0	SWE-bench Pro	LiveBench (High)
GPT-5.5	82.7%	58.6%	56.67
Claude Opus 4.7	—	64.3%	—
Gemini 3.1 Pro	—	54.2%	—
GPT-5.4	75.1%	—	70.00

GPT-5.5 wins tool orchestration. It does not win deep multi-file reasoning ⁴.

“More reasoning compute appears to make it rationalize nonsense rather than reject it.” — MindStudio, on the Pro variant pushing back on only 35% of nonsensical prompts vs 45% for the standard model ²

Cyber: real capability, contested oversight

The High cyber rating is the part of the announcement that should travel. The new Trusted Access for Cyber program — cyber-permissive variants for vetted defenders — is OpenAI’s answer to that capability. It is also where governance starts to fray. The UK AI Security Institute reports it found a critical jailbreak in pre-deployment testing and was then denied access to the final patched model, leaving the fix unverified ⁵. Independent researchers describe TAC as a private club where OpenAI grades its own homework ⁵. Independent benchmark verification of the patches is not yet available.

The Superapp is the actual launch

The model is the visible piece; the structural move is the cockpit. Codex (reportedly rewritten in Rust), ChatGPT and the Atlas browser are being collapsed into a single execution surface, with the API context window pushed to 1M tokens to make long-horizon, cross-tool agents tractable.

flowchart LR
    U[User] --> SA{GPT-5.5 Superapp}
    SA --> CX[Codex / Fast mode]
    SA --> CH[ChatGPT]
    SA --> AT[Atlas browser]
    CX & CH & AT --> CON[Connectors / tools]
    CON -. one outage, one blast radius .-> U

Absolutely Agentic’s read: the design is a deliberate vendor lock-in play, and the concentration risk is no longer hypothetical after recent Claude Code outages ⁶. Pricing reinforces the bet — $5/$30 per million tokens standard, $30/$180 for Pro, roughly 2× GPT-5.4 — defended only by claimed token-efficiency gains that developers haven’t yet reproduced at scale ³.

What’s actually at stake

GPT-5.5 is the new agentic-coding default. It is also a model whose hallucination profile gets worse with more reasoning compute ², whose cyber patches allied evaluators can’t audit ⁵, and whose price hike rides on efficiency claims OpenAI is asking buyers to take on faith. The Superapp makes all three matter more, not less.

DeepSeek V4: rerouted around the chokepoints, not caught up

Source: mit-tech-review-ai · published 2026-04-24

TL;DR

V4’s real story is efficiency: hybrid attention cuts KV cache to ~10% and inference FLOPs to ~27% of V3.2.
Independent benchmarks show 1M-token retrieval drops to 0.59 — and fails non-deterministically.
V4 was natively tuned for Huawei Ascend 950PR, triggering a Chinese-chip scramble.
Anthropic and OpenAI allege DeepSeek harvested ~16M Claude exchanges via fake accounts.

The architecture is the headline, not the leaderboard

MIT Tech Review’s writeup leans on “longer prompts, open weights, cheap.” The more interesting claim sits in the technical writeups: V4 interleaves Compressed Sparse Attention (4:1) with Heavily Compressed Attention (128:1), pushing KV-cache memory to roughly 10% and inference FLOPs to 27% of V3.2 at million-token contexts ⁷. Stack that on a 1.6T-parameter MoE stabilized by Manifold-Constrained Hyper-Connections and trained with a Muon optimizer, and you get VentureBeat’s headline number: “near-state-of-the-art intelligence at 1/6th the cost of Opus 4.7 / GPT-5.5” ⁸.

That’s an efficiency win, not a capability win. The Council on Foreign Relations puts V4-Pro three to six months behind frontier U.S. models on ARC-AGI and complex agentic tasks ⁹. “China caught up” is the wrong frame; “China shipped frontier-adjacent at a sixth of the spend” is the right one.

The 1M-token claim doesn’t survive contact with benchmarks

DeepSeek’s million-token marketing degrades fast under independent testing. LMSYS’s MRCR 8-needle retrieval holds above 0.82 through 256K tokens but collapses to 0.59 at the 1M ceiling — and crucially, V4-Pro fails non-deterministically across context positions rather than at a predictable depth ¹⁰. That’s worse than a clean degradation curve, because you can’t engineer a verification layer around an unpredictable failure mode. CFR also flags that DeepSeek’s mobile app ships deprecated 3DES encryption and hard-coded keys ⁹, which matters if you were planning to pipe sensitive long-context workloads through it.

Retrieval stays above 0.82 through 256K tokens but falls to 0.59 at the 1M limit; V4-Pro exhibits non-deterministic failures across context positions. ¹⁰

The distillation fight MIT Tech Review skipped

The piece largely sidesteps the brewing IP fight. Anthropic says it identified ~24,000 fraudulent accounts that generated over 16 million Claude exchanges to harvest reasoning traces, and OpenAI told the House Select Committee on China that DeepSeek used “obfuscated methods” to scrape outputs ¹¹. DeepSeek calls this “groundless suppression,” but the allegations have already produced a Frontier Model Forum anti-distillation pact and a “Stop AI Model Theft Act” in Congress. Any honest read of V4’s training cost has to account for the possibility that a non-trivial slice of its reasoning capability was lifted, not learned.

Hardware sovereignty is the durable shift

The most consequential second-order effect isn’t on the leaderboard — it’s in the supply chain. Reuters reports ByteDance, Alibaba and other Chinese majors are now scrambling to lock down Huawei Ascend 950PR inventory, because V4 was natively optimized for Ascend rather than NVIDIA’s H800 ¹².

flowchart LR
    A[Hybrid attention<br/>+ MoE co-design] --> D[V4 frontier-adjacent model]
    B[Huawei Ascend 950PR<br/>non-CUDA silicon] --> D
    C[Alleged Claude/GPT<br/>distillation traces] -.-> D
    D --> E[Chinese cloud demand<br/>shifts off NVIDIA]

That is the part of the story export controls were supposed to prevent. DeepSeek demonstrated that hardware–software co-design can land a 1.6T-parameter model on domestic silicon — and the market responded within days. The benchmark gap will close or it won’t; the chip-stack realignment is already happening.

Google’s $40B Anthropic bet is a compute-for-equity barter, not a check

Source: ars-technica-ai · published 2026-04-24

TL;DR

Google’s $40B headline is $10B cash at a $350B pre-money plus $30B in milestones, bundled with 5GW of Cloud capacity.
It lands four days after Amazon’s $33B package, leaving Anthropic uniquely tethered to two rival silicon stacks (TPU and Trainium).
DOJ search-remedy proposals would ban Google AI investments and force divestiture of the existing stake within six months.
Critics call the structure circular: Anthropic books gross revenue on Bedrock/Vertex while paying cloud partners 20–30% back.

The deal under the headline

The $40B compresses two very different things. Roughly $10B is upfront cash at a $350B pre-money valuation; the other $30B is milestone-contingent, paired with 5 gigawatts of Google Cloud capacity over five years ¹³. A Broadcom regulatory filing exposed the hardware spine — a three-way agreement to stand up 3.5GW of TPU-based capacity starting in 2027 ¹³. As one Hacker News commenter put it:

Capital is fungible, but substations and transformers are not.

That is the most honest framing of what changed this week ¹⁴. With grid interconnects in key regions booked through 2028, guaranteed gigawatts and a reserved TPU allocation are worth more than the cash line item.

Two silicon stacks, one lab

The Google round landed four days after Amazon lifted its own commitment to $33B — $5B equity plus $20B in milestones — backed by Anthropic’s pledge to spend more than $100B on AWS over the next decade across Trainium2 through Trainium4 ¹⁵. No other frontier lab is simultaneously locked into two competing proprietary accelerators.

Partner	Headline	Cash equity	Silicon
Google (Apr 2026)	$40B	~$10B @ $350B pre	TPU (3.5GW via Broadcom) ¹³
Amazon (Apr 2026)	$33B	$5B + $20B milestones	Trainium2–4, $100B+ AWS spend ¹⁵

flowchart LR
    G[Google] -->|$40B + 5GW TPU| A[Anthropic]
    AMZ[Amazon] -->|$33B + Trainium| A
    A -->|Bedrock / Vertex revenue<br/>booked gross| C[Customers]
    C -->|20–30% rev share| G
    C -->|20–30% rev share| AMZ

The circular-financing complaint

The loudest dissent treats this as a bookkeeping loop. A Forbes analysis flagged that Anthropic recognizes the gross transaction price on AWS Bedrock and Google Vertex deals before deducting the cloud provider’s 20–30% cut, while OpenAI reportedly uses net accounting ¹⁶. The same dollar can therefore appear as Anthropic revenue, cloud revenue, and — once recycled into TPU or Trainium spend — capex justification for the next funding round. Independent reconciliation of Anthropic’s lifetime revenue against its $30B-plus annualized narrative isn’t yet available in the research bundle, but the accounting divergence alone is enough to keep auditors interested.

Regulators and the Pentagon

The DOJ’s proposed final judgment in the Google search-remedies case includes a categorical ban on Google AI investments and forced divestiture of the existing Anthropic stake within six months; Anthropic filed to intervene, warning that a forced expedited sale would depress its valuation ¹⁷. Read against that backdrop, the $40B commitment is partly defensive — deeper operational entanglement is harder to unwind than a clean equity line.

Anthropic’s customer-side politics are no calmer. FedScoop reports a dispute with the Pentagon over usage guardrails escalated to a governmentwide ban before Anthropic was eventually re-included ¹⁸ — a contrast with Google’s more permissive defense posture, and a reminder that the company is buying compute faster than it is buying customers willing to live with its policies.

The headline number is real. The cash, the independence, and the revenue quality underneath it are all contested.

Round-ups

OpenAI Academy Codex 7-part curriculum drop

Source: openai-blog

OpenAI Academy rolled out a seven-part Codex curriculum repositioning the tool as a workplace agent for non-developers, covering setup, daily workflows, top use cases, plugins and skills, scheduled automations, and configuration settings rather than pure coding tasks.

Further reading:

Top 10 uses for Codex at work — openai-blog
Automations — openai-blog
Plugins and skills — openai-blog
How to get started with Codex — openai-blog
Working with Codex — openai-blog
Codex settings — openai-blog

US accuses China of “industrial-scale” AI theft. China says it’s “slander.”

Source: ars-technica-ai

The US is weighing major sanctions over claims of ‘industrial-scale’ Chinese AI theft — including model distillation from OpenAI, Anthropic, and Google — ahead of a planned Trump–Xi summit. Beijing called the accusations ‘slander.‘

Report: Samsung execs worried company could lose money on smartphones for the first time

Source: ars-technica-ai

Samsung executives reportedly fear the company’s smartphone division could post its first-ever annual loss, as AI datacenter demand drives a memory shortage that has spiked HBM and DRAM prices Samsung itself must now pay to source for handsets.

Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce

Source: anthropic-news

Anthropic partnered with NEC to train what the companies describe as Japan’s largest Claude-skilled engineering workforce, embedding the model into NEC’s enterprise software services and developer tooling for the Japanese market.

An update on our election safeguards

Source: anthropic-news

Anthropic updated its election-integrity measures, detailing how Claude’s policies, monitoring, and safeguards have been adjusted in response to misuse patterns observed during recent election cycles. Snippet-thin; details on specific enforcement numbers or policy changes are not provided.

The people do not yearn for automation

Source: simon-willison

Simon Willison highlights Nilay Patel’s Verge essay arguing that ChatGPT’s surging usage masks broad public hostility to AI, blamed on ‘software brain’ executives who model life as automatable information flows — citing the decade-long failure of smart-home automation as proof regular people don’t want it.

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Source: latent-space

Latent Space’s crossover episode with Unsupervised Learning debriefs the AI Engineer Europe summit and lays out an ‘agent labs’ thesis on the emerging category of agent-native startups. Recorded just before the Cursor–xAI deal broke.

MindFort cybersecurity evaluation — https://www.mindfort.ai/blog/how-good-is-gpt-5-5-for-cybersecurity

Independent evaluations by XBOW indicate a ‘step change’ in performance, with GPT-5.5 achieving a vulnerability miss rate of only 10%, a significant improvement over the 40% miss rate of the original GPT-5.

↩
MindStudio review (BullshitBench / LiveBench) — https://www.mindstudio.ai/blog/gpt-5-5-review-agentic-model

GPT-5.5 (High Effort) scored only 56.67 on LiveBench, well below GPT-5.4’s 70.00; on BullshitBench the Pro variant pushed back on just 35% of nonsensical prompts versus 45% for the standard model — more reasoning compute appears to make it ‘rationalize nonsense’ rather than reject it.

↩ ↩² ↩³
The Decoder — https://the-decoder.com/gpt-5-5-tops-benchmarks-but-still-hallucinates-frequently-at-a-20-percent-higher-api-cost/

GPT-5.5 tops benchmarks but still hallucinates frequently at a 20 percent higher API cost

↩ ↩²
Mohit / Medium model comparison — https://medium.com/@mohit15856/gpt-5-5-vs-claude-opus-4-7-vs-gemini-3-1-pro-vs-deepseek-v4-18dafdcf9b5e

Claude Opus 4.7 leads SWE-bench Pro at 64.3% vs GPT-5.5’s 58.6% and Gemini 3.1 Pro’s 54.2% — GPT-5.5 wins terminal/agentic benchmarks but trails on multi-file repository reasoning.

↩
LetsDataScience / AISI reporting — https://letsdatascience.com/news/topic/cybersecurity

The UK AI Security Institute reported finding a critical jailbreak in GPT-5.5 during pre-deployment testing; however, they were unable to verify if OpenAI’s subsequent patches were effective because they were denied access to the final version of the model.

↩ ↩² ↩³
Absolutely Agentic analysis of the Superapp — https://absolutelyagentic.com/p/gpt-5-5-inches-openai-closer-to-its-super-app-dream

The ‘Superapp’ strategy creates significant vendor lock-in and ‘concentration risk’ — collapsing ChatGPT, Codex and the Atlas browser into one cockpit raises the cost of any single OpenAI outage.

↩
Medium — ‘Cracking the Million-Token Barrier’ — https://medium.com/towardsdev/cracking-the-million-token-barrier-a-deep-dive-into-deepseek-v4s-architecture-3a11c6a87b40

Hybrid attention interleaves Compressed Sparse Attention (4:1) and Heavily Compressed Attention (128:1), cutting KV cache to 10% and inference FLOPs to 27% of V3.2

↩
VentureBeat — https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5

near-state-of-the-art intelligence at 1/6th the cost of Opus 4.7 / GPT-5.5

↩
Council on Foreign Relations — https://www.cfr.org/articles/deepseek-v4-signals-a-new-phase-in-the-u-s-china-ai-rivalry

V4-Pro still trails frontier U.S. models by roughly three to six months on ARC-AGI and complex agentic tasks; mobile app uses deprecated 3DES encryption and hard-coded keys

↩ ↩²
LMSYS blog — https://www.lmsys.org/blog/2026-04-25-deepseek-v4/

MRCR 8-needle retrieval stays above 0.82 through 256K tokens but falls to 0.59 at the 1M limit; V4-Pro exhibits non-deterministic failures across context positions

↩ ↩²
Skywork analysis (Anthropic/OpenAI allegations) — https://skywork.ai/skypage/en/deepseek-v4-distillation-analysis/2047584065013108736

Anthropic identified ~24,000 fraudulent accounts that generated over 16 million Claude exchanges to harvest reasoning traces; OpenAI told the House Select Committee on China that DeepSeek used ‘obfuscated methods’ to scrape outputs

↩
WKZO/Reuters — Huawei chip scramble — https://wkzo.com/2026/04/29/exclusive-big-chinese-tech-firms-scramble-to-secure-huawei-ai-chips-after-deepseek-v4-launch-sources-say/

Big Chinese tech firms scramble to secure Huawei AI chips after DeepSeek V4 launch; V4 was natively optimized for Huawei Ascend 950PR rather than NVIDIA H800

↩
Silicon Republic (Anthropic/Google/Broadcom 3.5GW TPU disclosure) — https://www.siliconrepublic.com/machines/anthropic-google-broadcom-announce-3-5gw-tpu-deal

A Broadcom regulatory filing revealed a three-way agreement to supply 3.5 GW of TPU-based capacity starting in 2027, part of a wider 5 GW commitment from Google Cloud over five years.

↩ ↩² ↩³
Hacker News discussion — https://news.ycombinator.com/item?id=47899449

Capital is fungible, but substations and transformers are not — the $40B is less about cash than about guaranteed access to scarce TPUs and gigawatts in regions where power grids are booked through 2028.

↩
Forbes (Markman) on Amazon’s $33B and infrastructure limits — https://www.forbes.com/sites/jonmarkman/2026/04/22/amazon-33-billion-anthropic-deal-and-the-limits-of-ai-infrastructure/

Amazon’s package brought its total commitment to $33 billion ($5B equity + $20B milestones), with Anthropic pledging over $100 billion of AWS spend over the next decade across Trainium2–Trainium4.

↩ ↩²
Forbes (Majic) on revenue accounting divergence — https://www.forbes.com/sites/josipamajic/2026/03/25/openai-and-anthropic-count-revenue-differently-and-investors-are-looking-into-it/

Anthropic recognizes the total transaction price paid by customers on platforms like AWS Bedrock and Google Vertex AI before deducting the cloud provider’s 20–30% share, while OpenAI reportedly uses a ‘net’ method.

↩
AIFray on DOJ remedies / Anthropic amicus — https://aifray.com/anthropic-blindsided-by-proposed-u-s-google-ai-investment-ban-seeks-to-participate-in-doj-antitrust-suit/

The DOJ proposed a final judgment that includes a categorical ban on Google investing in AI firms, alongside a requirement to divest its existing stake in Anthropic within six months; Anthropic warned a ‘forced, expedited sale’ would depress its market value.

↩
FedScoop on Anthropic-Pentagon rift — https://fedscoop.com/radio/anthropics-inclusion-comes-after-a-disagreement-between-the-ai-company-and-the-pentagon-over-guardrails-for-using-its-technology-culminated-in-a-governmentwide-ban/

Anthropic’s inclusion comes after a disagreement between the AI company and the Pentagon over guardrails for using its technology culminated in a governmentwide ban.

↩

Compute, capital, and access: the real shape of the AI frontier today

TL;DR

GPT-5.5 lands as the agentic-coding leader — and a jagged one

TL;DR

A leader on the benchmarks OpenAI picked

Cyber: real capability, contested oversight

The Superapp is the actual launch

What’s actually at stake

Further reading

DeepSeek V4: rerouted around the chokepoints, not caught up

TL;DR

The architecture is the headline, not the leaderboard

The 1M-token claim doesn’t survive contact with benchmarks

The distillation fight MIT Tech Review skipped

Hardware sovereignty is the durable shift

Google’s $40B Anthropic bet is a compute-for-equity barter, not a check

TL;DR

The deal under the headline

Two silicon stacks, one lab

The circular-financing complaint

Regulators and the Pentagon

Round-ups

OpenAI Academy Codex 7-part curriculum drop

US accuses China of “industrial-scale” AI theft. China says it’s “slander.”

Report: Samsung execs worried company could lose money on smartphones for the first time

Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce

An update on our election safeguards

The people do not yearn for automation

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Footnotes

Jack Sun, writing.