Oscars price AI out of acting, Grok cuts tokens 60%, o1 runs hit $2,800

TL;DR

The Academy barred AI from acting and writing categories, leaving voice-cloning tools like Respeecher eligible.
Grok 4.3 cut output pricing 60%, running full evals at ~$395 versus GPT-5.5’s ~$4,000.
DeepSeek V4 Pro’s new attention cuts KV cache 90% and inference FLOPs to 27% at 1M context.
Nvidia pegs AI compute demand growth at 1,000,000× over 2 years, driven by reasoning loops.
Apple researchers document a total collapse of thinking models as problem complexity scales.

Three different markets put a new price on AI today, and each price came with a tradeoff the headline number didn’t carry. The Academy decided AI performances and screenplays don’t qualify for acting and writing Oscars, but voice-cloning tools like Respeecher still do — a line that The Brutalist and Emilia Pérez already crossed without consequence. Grok 4.3 and DeepSeek V4 Pro both pivoted to efficiency over raw IQ, with output pricing and KV cache slashed by 60% and 90% respectively, while independent testers flagged narcolepsy on long-horizon runs and reproducibility doubts on the stack.

Underneath all of it, Nvidia put a number on the demand curve — AI compute up 1,000,000× in 2 years — and Apple put a ceiling on what that compute is buying, documenting a total collapse of thinking models as problems get harder. The day’s news isn’t capability; it’s the bill, and who’s paying it.

Oscars bar AI actors and scripts, but Respeecher still qualifies

Source: techcrunch-ai · published 2026-05-02

TL;DR

The Academy barred AI performances and screenplays from acting and writing categories only.
Voice-cloning tools like Respeecher — used in The Brutalist and Emilia Pérez — remain “neither help nor harm.”
Cannes already barred generative AI from Official Competition on “personal vision” grounds.
Berlin went the other way, requiring disclosure on submission forms but no ban.
Enforcement rests on self-disclosure and branch judgment, leaving Brutalist-style hybrids uncaught.

A consensus, ratified late

AMPAS is the last big gatekeeper to the party. SAG-AFTRA set the rhetorical frame last autumn when it answered the Tilly Norwood unveiling with the line “creativity is, and should remain, human-centered” ¹ — language the Academy has now effectively adopted. The WGA’s 2023 strike contract already excluded LLM output from “literary material,” and Cannes went further in early 2026, barring generative AI from Official Competition on the grounds that a film must be “a personal vision” rather than “an assembly of data” ².

Against that backdrop, the Oscars rule is narrower than the headline suggests. It targets two categories — acting and writing — where a wholly synthetic submission would have been politically untenable anyway. Everywhere else, the Academy’s older “neither help nor harm” posture survives.

The Respeecher problem

The hard cases are not avatars like Norwood. They are the hybrid productions that already won.

Film	AI use	Outcome under old rule	Outcome under new rule
The Brutalist	Respeecher refined Adrien Brody’s Hungarian vowels ³	Eligible, won	Likely still eligible
Emilia Pérez	Respeecher extended Karla Sofía Gascón’s vocal range ³	Eligible, won	Ambiguous — performance “demonstrably” human?
As Deep as the Grave	AI-rendered posthumous Val Kilmer, estate-approved ⁴	Eligible	Acting category likely barred

The new standard — that a performance be “demonstrably performed by humans” — does not obviously disqualify Brody’s Oscar-winning turn, because the underlying performance was his. It does likely catch Coerte Voorhees’s posthumous Kilmer work, which the director has defended as a legitimate artistic choice “regardless of its Oscar eligibility” ⁴. Where the line falls between vowel-correction and full resurrection is now a branch-by-branch judgment call, with the Academy reserving the right to demand documentation rather than running detection.

Fault lines elsewhere

The festival circuit is not converging on Cannes. The Berlinale has explicitly rejected a ban, instead requiring filmmakers to disclose AI use on submission forms and using the data to monitor budgeting, scheduling, and post-production tooling ⁵. Independent AI-native work is migrating to parallel circuits like the Runway and World AI film festivals.

And the creator at the center of this whole episode is unmoved. Eline Van der Velden, who has received death threats over Tilly Norwood ⁶, frames the project as a “deliberate provocation” and a “new paintbrush,” arguing synthetic characters should be judged “as part of their own genre” rather than against human actors. Her position will stay marginal in Hollywood. It is not marginal globally.

What’s actually at stake

The symbolic line — Oscars don’t go to robots — is now drawn. The operational line is fuzzier: voice cloning of consenting living actors is fine, posthumous reanimation probably isn’t, and everything in between depends on what producers choose to disclose. The interesting fights of the next cycle won’t be about Tilly Norwood. They’ll be about the next Brutalist.

Grok, DeepSeek, Google all pivot to efficiency over IQ

Source: latent-space · published 2026-05-02

TL;DR

Grok 4.3 drops output pricing 60% and runs full eval suites for ~$395 vs GPT-5.5’s ~$4,000.
DeepSeek V4 Pro’s CSA/HCA attention cuts KV cache 90% and inference FLOPs to 27% at 1M context.
Independent testers flag “narcolepsy” in Grok’s long-horizon agent runs and reproducibility doubts on DeepSeek’s stack.
Google’s TPU 8t still trails Nvidia Vera Rubin ~1:3 per socket despite the 180% cost-perf marketing line.

The headline isn’t the conference, it’s the price collapse

Latent Space’s roundup is dressed up as a Wave 2 call for speakers for the AI Engineer World’s Fair, but the actual signal across this week’s drops is uniform: the frontier is competing on dollars-per-token, not Intelligence Index points. Every release in the digest — Grok 4.3, DeepSeek V4 Pro, ReaLM-Retrieve, TPU 8t/8i — earns its headline through an efficiency multiplier rather than a capability ceiling. None of them beats Claude Opus 4.7 (57) or GPT-5.5 (60) on raw IQ.

Grok 4.3: Pareto-optimal, narcoleptic

xAI’s 53 on the Intelligence Index is a 4-point bump, but the real news is the cost curve. Artificial Analysis puts Grok 4.3 on the intelligence/price Pareto frontier, with a full benchmark sweep running ~$395 versus ~$4,000 for GPT-5.5 ⁷. Input prices fell 40%, output 60%, and GDPval-AA Elo jumped 321 points to 1500.

The dissent comes from agent-land. Andon Labs’ independent long-horizon eval surfaced what they called “narcolepsy” — the model sitting idle mid-trajectory instead of taking the next action — and flagged unusual verbosity that erodes the per-token discount when measured per completed task ⁸. Non-hallucination accuracy also dropped 8 points. The price story is real; the “Opus killer” framing is not.

DeepSeek V4 Pro: 90% less KV cache, hard to copy

The Latent Space numbers undersell DeepSeek’s efficiency play. Independent writeups put the alternating CSA/HCA attention design at a 90% KV-cache reduction and 27% of V3.2’s inference FLOPs at 1M-token context ⁹ — not the headline you’d expect from a 1.6T-parameter MoE.

The catch is reproducibility. Reviewers note that the FP4 Lightning Indexer plus interleaved HCA layers carry enough hyperparameter complexity that other labs may struggle to replicate the recipe without DeepSeek’s internal tooling, even with MIT-licensed weights ⁹. The companion “Thinking with Visual Primitives” work — embedding coordinate tokens directly into reasoning traces — is murkier still: DeepSeek briefly released, then deleted, the repo and tech report, and practitioners are arguing about “trigger-word dependency” and whether the maze-solving demos generalize ¹⁰.

Retrieval and silicon tell the same story

ReaLM-Retrieve is being sold on a 10.1% F1 lift over standard RAG, but the more interesting number is 47% fewer retrieval calls — 1.8 per MuSiQue question — establishing a new accuracy-per-call Pareto frontier ¹¹. The caveat is operational: combining long reasoning traces with dense retrieved context strains consumer GPUs like the RTX 4090, so the technique deploys cleanly on datacenter hardware and awkwardly elsewhere ¹¹.

Google’s TPU split follows the same pattern. The “180% training cost-performance” line is Google’s own; Tom’s Hardware pegs TPU 8t at roughly 1:3 against Nvidia’s Vera Rubin on per-socket compute, and notes Google chose HBM3e over the HBM4 in competing Nvidia/AMD parts — a deliberate yield-and-cost trade, not a performance win ¹². The strategic move is the bifurcation into training (8t) and inference (8i) SKUs, plus a MediaTek second-source, both aimed at the volume needed to honor the Anthropic million-chip commitment.

What to take away

Independent voices line up the same way across all four announcements: extend the efficiency claim, puncture the reliability or reproducibility claim. If you’re picking models or chips this quarter, the right axis is cost-per-completed-task, not Intelligence Index — and the right discount rate on vendor benchmarks is steep.

Inference scaling won; o1-class runs now cost $2,800

Source: latent-space · published 2026-04-30

TL;DR

Nvidia says AI compute demand grew 1,000,000× in two years, driven by agents looping through reasoning instead of answering once ¹³.
The “thinking tax” is real: benchmarking o1 cost ~$2,800 versus $109 for GPT-4o on the same workload ¹⁴.
Apple researchers documented a “total collapse” of thinking models as problem complexity scales, calling the gains pattern-matching rather than reasoning ¹⁵.
DeepSeek-R1 already matches o1-preview on AIME (79.8% vs 79.2%) and Codeforces (96.3 vs 96.6 percentile) ¹⁶.

The thesis aged into consensus

Latent Space’s “Inference Inflection” essay argues that test-time compute has become the new scaling axis — models that pause, search, and verify are eating the gains pre-training used to deliver. By April 2026, that’s not a take; it’s the substrate. Jensen Huang told GTC 2026 that compute demand grew a million-fold in two years specifically because agents now loop through reasoning rather than emitting one answer ¹³. Hardware analyses peg inference at 80–90% of total AI system lifecycle cost, with the “memory wall” — moving KV-cache data between HBM and cores — now the dominant bottleneck ¹⁷. The o1 and Llama 3.1 405B references in the original essay read as period pieces precisely because the field absorbed them.

So the interesting question isn’t whether the inflection happened. It’s what the inflection broke.

Where reasoning still breaks

The essay’s headline number — AIME accuracy jumping from ~12% to >80% with chain-of-thought — is real but narrow. Apple’s mid-2025 study took thinking models past their comfort zone and watched them disintegrate:

thinking models excel at simple logic [but] experience a ‘total collapse’ as problem complexity increases, suggesting they rely on sophisticated pattern matching rather than generalizable formal reasoning ¹⁵

That finding maps onto the “overthinking” literature practitioners have been quietly cataloging: longer reasoning chains can rationalize wrong answers as easily as correct them. The gains hold where verifiers are cheap and ground truth is mechanical (math, competitive programming). They degrade fast on open-ended tasks where the model has to grade its own homework.

The thinking tax

The essay also glosses over the bill. Silicon Sands Studio’s “Thinking Tax” benchmark put o1 at ~$2,800 versus $109 for GPT-4o on the same workload — a ~25× premium — because a “concise” final answer is routinely backed by a 15,000-token hidden reasoning trace billed at output rates ¹⁴. That’s the operational consequence of the inflection: providers are quietly shifting toward compute-based pricing because token-based pricing can’t accommodate models that think for a minute before speaking. For anyone running reasoning models in production, the unit economics question — is this query worth the chain-of-thought it triggered? — is now the design constraint.

Open weights are erasing the moat faster than expected

The essay frames inference scaling as a frontier-lab capability. Six months later, that framing looks fragile. DeepSeek-R1 ships open weights that match o1-preview on AIME (79.8% vs 79.2%) and sit one-third of a percentile behind on Codeforces ¹⁶. Hugging Face’s Open-R1 released Mixture-of-Thoughts, a 350k-trace dataset of verified reasoning chains explicitly designed to let the community train models that emulate the “hidden” CoT process ¹⁸. The proprietary advantage in test-time reasoning is being distilled into open weights on a quarterly cycle.

Net

The “Inference Inflection” was the right call in 2024 and is the boring premise of 2026. The live debates have moved on: whether reasoning is reasoning or pattern-matching ¹⁵, whether per-query economics survive contact with production ¹⁴, and how long any lab can charge a premium for capability that ships open-weights two quarters later ¹⁸¹⁶.

Round-ups

[AINews] not much happened today

Source: latent-space

Latent Space’s daily AI News digest marks April 29 as a quiet day, with no major model releases, research drops, or industry announcements meriting a headline summary — a rare lull in the otherwise relentless 2026 AI news cycle.

Screen Daily (SAG-AFTRA statement) — https://www.screendaily.com/news/sag-aftra-speaks-out-over-ai-actress-tilly-norwood-creativity-should-remain-human-centred/5209482.article

creativity is, and should remain, human-centered

↩
The Guardian — Cannes AI policy 2026 — https://www.theguardian.com/technology/2026/apr/26/cannes-ai-film-festival-raises-eyebrows-questions-future

Festival President Iris Knobloch emphasized that a film must be a ‘personal vision’ rather than an ‘assembly of data,’ barring generative AI from Official Competition

↩
Consequence — ‘The Brutalist’ & ‘Emilia Pérez’ AI controversy — https://consequence.net/2025/01/ai-oscars-2025-brutalist-emilia-perez/

Respeecher was used to refine Adrien Brody and Felicity Jones’s Hungarian pronunciation… and to extend Karla Sofía Gascón’s vocal range — uses the Academy ruled ‘neither help nor harm’ a film’s chances

↩ ↩²
World of Reel — Val Kilmer / ‘As Deep as the Grave’ — https://www.worldofreel.com/blog/2026/5/1/oscars-ai-banned-from-acting-and-writing-categories

Coerte Voorhees, director of As Deep as the Grave — which features a posthumous AI performance of Val Kilmer — defended his work as an artistic choice supported by the actor’s estate, regardless of its Oscar eligibility

↩ ↩²
Screen Daily — Berlinale AI disclosure — https://www.screendaily.com/news/berlinale-asking-have-you-used-ai-of-submitted-films/5211259.article

the Berlinale now requires filmmakers to disclose AI usage on submission forms, but has not implemented a ban — using the data to monitor how tools are used for budgeting, scheduling and post-production

↩
The Guardian — Tilly Norwood backlash — https://www.theguardian.com/technology/2026/mar/26/tilly-norwood-ai-actor-creator-backlash-death-threats

Van der Velden reported receiving death threats… defending the project as a ‘new paintbrush’ and a ‘deliberate provocation’ to force the industry to confront AI

↩
Artificial Analysis — https://artificialanalysis.ai/articles/xai-launches-grok-4-3-with-improved-agentic-performance-and-lower-pricing

Grok 4.3 now sits on the ‘Pareto frontier,’ offering the best intelligence-to-cost ratio… running a full benchmark suite on Grok 4.3 costs roughly $395, compared to nearly $4,000 for GPT-5.5

↩
Towards AI (Medium) — independent agent eval — https://medium.com/towards-artificial-intelligence/i-tested-grok-4-3-on-18-long-horizon-agent-tasks-the-10-cheaper-xai-model-embarrassed-opus-4-7-6dd9de45ecbc

Andon Labs reported ‘narcolepsy’ issues in autonomous environments, where the model may sit idle instead of executing multi-step actions… noted for being extremely verbose, generating significantly more tokens than the average model during evaluation

↩
Medium / Data Science in Your Pocket — DeepSeek V4 deep dive — https://medium.com/data-science-in-your-pocket/deepseek-v4-released-253cb2bb81f6

By alternating CSA and HCA layers, DeepSeek-V4-Pro achieves a 90% reduction in KV cache memory and requires only 27% of the inference FLOPs compared to V3.2 at 1M-token scale; some technical observers suggest the architecture’s hyperparameter complexity may be difficult for other labs to replicate without DeepSeek’s proprietary tooling

↩ ↩²
r/LocalLLaMA discussion of DeepSeek ‘Thinking with Visual Primitives’ — https://www.reddit.com/r/LocalLLaMA/comments/1szwi1d/deepseek_released_thinkingwithvisualprimitives/

DeepSeek briefly released and then deleted the official repository and technical report shortly after publication… debate among experts regarding ‘trigger word dependency’ and the challenges of generalizing these ‘cyber finger’ techniques across diverse, fine-grained scenarios

↩
themoonlight.io — ReaLM-Retrieve paper review — https://www.themoonlight.io/en/review/when-to-retrieve-during-reasoning-adaptive-retrieval-for-large-reasoning-models

ReaLM-Retrieve requires 47% fewer retrieval calls — averaging just 1.8 calls per question on MuSiQue — but some independent researchers have raised questions regarding the practicality of certain integration methods on consumer-grade hardware (like the RTX 4090), where memory limits may hinder use of large reasoning traces alongside dense retrieved context

↩ ↩²
Tom’s Hardware — https://www.tomshardware.com/tech-industry/semiconductors/google-splits-its-tpu-into-two-chips-for-the-first-time-with-training-and-inference-variants

TPU 8t lags behind Nvidia’s Vera Rubin in raw per-socket compute (roughly 1:3)… uses HBM3e instead of the next-generation HBM4 found in Nvidia and AMD competitors, likely to prioritize yield and cost-efficiency

↩
Forbes — Nvidia’s Trillion-Dollar Prediction (Apr 2026) — https://www.forbes.com/sites/sanjitsinghdang/2026/04/20/nvidias-trillion-dollar-prediction-marks-ais-inflection-point/

computing demand had increased by one million times over the preceding two years, driven by the shift from discrete queries to persistent agent loops

↩ ↩²
Silicon Sands Studio — ‘The Thinking Tax’ — https://siliconsandstudio.substack.com/p/the-thinking-tax-ais-real-cost-is

benchmarking o1 cost nearly $2,800 compared to just $109 for GPT-4o, reflecting the sheer volume of hidden compute required for complex tasks … a seemingly concise answer can be backed by a 15,000-token reasoning chain

↩ ↩² ↩³
The Decoder (on Apple research) — https://the-decoder.com/apple-study-finds-a-fundamental-scaling-limitation-in-reasoning-models-thinking-abilities/

thinking models excel at simple logic [but] experience a ‘total collapse’ as problem complexity increases, suggesting they rely on sophisticated pattern matching rather than generalizable formal reasoning

↩ ↩² ↩³
Galileo AI — DeepSeek-R1 vs OpenAI o1 — https://galileo.ai/blog/deepseek-r1-vs-openai-o1-comparison

DeepSeek-R1 matches or slightly exceeds o1-preview on mathematics (AIME 2024: 79.8% vs 79.2%) and coding (Codeforces: 96.3rd vs 96.6th percentile)

↩ ↩² ↩³
Medium — ‘How LLMs Break the Memory Wall in 2026’ — https://medium.com/@athuavjoshi11/the-speculative-escape-how-llms-break-the-memory-wall-in-2026-764133117c1e

the ‘Memory Wall’ — the bottleneck where moving data from High Bandwidth Memory (HBM) to processor cores consumes more time and energy than the actual calculations … inference now accounts for 80–90% of total system lifecycle costs

↩
Hugging Face Open-R1 project page — https://huggingface.co/open-r1

Open-R1 … released massive datasets, such as Mixture-of-Thoughts (350k verified traces), to enable community-driven training of models that emulate the ‘hidden’ chain-of-thought process

↩ ↩²

Oscars price AI out of acting, Grok cuts tokens 60%, o1 runs hit $2,800

TL;DR

Oscars bar AI actors and scripts, but Respeecher still qualifies

TL;DR

A consensus, ratified late

The Respeecher problem

Fault lines elsewhere

What’s actually at stake

Grok, DeepSeek, Google all pivot to efficiency over IQ

TL;DR

The headline isn’t the conference, it’s the price collapse

Grok 4.3: Pareto-optimal, narcoleptic

DeepSeek V4 Pro: 90% less KV cache, hard to copy

Retrieval and silicon tell the same story

What to take away

Inference scaling won; o1-class runs now cost $2,800

TL;DR

The thesis aged into consensus

Where reasoning still breaks

The thinking tax

Open weights are erasing the moat faster than expected

Net

Round-ups

[AINews] not much happened today

Footnotes

Jack Sun, writing.