JS Wei (Jack) Sun

Oscars price AI out of acting, Grok cuts tokens 60%, o1 runs hit $2,800

Three corners of AI got repriced today: the Academy's eligibility rules, Grok and DeepSeek's token cuts, and Nvidia's inference compute math.

Oscars price AI out of acting, Grok cuts tokens 60%, o1 runs hit $2,800

TL;DR

  • The Academy barred AI from acting and writing categories, leaving voice-cloning tools like Respeecher eligible.
  • Grok 4.3 cut output pricing 60%, running full evals at ~$395 versus GPT-5.5’s ~$4,000.
  • DeepSeek V4 Pro’s new attention cuts KV cache 90% and inference FLOPs to 27% at 1M context.
  • Nvidia pegs AI compute demand growth at 1,000,000× over 2 years, driven by reasoning loops.
  • Apple researchers document a total collapse of thinking models as problem complexity scales.

Three different markets put a new price on AI today, and each price came with a tradeoff the headline number didn’t carry. The Academy decided AI performances and screenplays don’t qualify for acting and writing Oscars, but voice-cloning tools like Respeecher still do — a line that The Brutalist and Emilia Pérez already crossed without consequence. Grok 4.3 and DeepSeek V4 Pro both pivoted to efficiency over raw IQ, with output pricing and KV cache slashed by 60% and 90% respectively, while independent testers flagged narcolepsy on long-horizon runs and reproducibility doubts on the stack.

Underneath all of it, Nvidia put a number on the demand curve — AI compute up 1,000,000× in 2 years — and Apple put a ceiling on what that compute is buying, documenting a total collapse of thinking models as problems get harder. The day’s news isn’t capability; it’s the bill, and who’s paying it.

Oscars bar AI actors and scripts, but Respeecher still qualifies

Source: techcrunch-ai · published 2026-05-02

TL;DR

  • The Academy barred AI performances and screenplays from acting and writing categories only.
  • Voice-cloning tools like Respeecher — used in The Brutalist and Emilia Pérez — remain “neither help nor harm.”
  • Cannes already barred generative AI from Official Competition on “personal vision” grounds.
  • Berlin went the other way, requiring disclosure on submission forms but no ban.
  • Enforcement rests on self-disclosure and branch judgment, leaving Brutalist-style hybrids uncaught.

A consensus, ratified late

AMPAS is the last big gatekeeper to the party. SAG-AFTRA set the rhetorical frame last autumn when it answered the Tilly Norwood unveiling with the line “creativity is, and should remain, human-centered” 1 — language the Academy has now effectively adopted. The WGA’s 2023 strike contract already excluded LLM output from “literary material,” and Cannes went further in early 2026, barring generative AI from Official Competition on the grounds that a film must be “a personal vision” rather than “an assembly of data” 2.

Against that backdrop, the Oscars rule is narrower than the headline suggests. It targets two categories — acting and writing — where a wholly synthetic submission would have been politically untenable anyway. Everywhere else, the Academy’s older “neither help nor harm” posture survives.

The Respeecher problem

The hard cases are not avatars like Norwood. They are the hybrid productions that already won.

FilmAI useOutcome under old ruleOutcome under new rule
The BrutalistRespeecher refined Adrien Brody’s Hungarian vowels 3Eligible, wonLikely still eligible
Emilia PérezRespeecher extended Karla Sofía Gascón’s vocal range 3Eligible, wonAmbiguous — performance “demonstrably” human?
As Deep as the GraveAI-rendered posthumous Val Kilmer, estate-approved 4EligibleActing category likely barred

The new standard — that a performance be “demonstrably performed by humans” — does not obviously disqualify Brody’s Oscar-winning turn, because the underlying performance was his. It does likely catch Coerte Voorhees’s posthumous Kilmer work, which the director has defended as a legitimate artistic choice “regardless of its Oscar eligibility” 4. Where the line falls between vowel-correction and full resurrection is now a branch-by-branch judgment call, with the Academy reserving the right to demand documentation rather than running detection.

Fault lines elsewhere

The festival circuit is not converging on Cannes. The Berlinale has explicitly rejected a ban, instead requiring filmmakers to disclose AI use on submission forms and using the data to monitor budgeting, scheduling, and post-production tooling 5. Independent AI-native work is migrating to parallel circuits like the Runway and World AI film festivals.

And the creator at the center of this whole episode is unmoved. Eline Van der Velden, who has received death threats over Tilly Norwood 6, frames the project as a “deliberate provocation” and a “new paintbrush,” arguing synthetic characters should be judged “as part of their own genre” rather than against human actors. Her position will stay marginal in Hollywood. It is not marginal globally.

What’s actually at stake

The symbolic line — Oscars don’t go to robots — is now drawn. The operational line is fuzzier: voice cloning of consenting living actors is fine, posthumous reanimation probably isn’t, and everything in between depends on what producers choose to disclose. The interesting fights of the next cycle won’t be about Tilly Norwood. They’ll be about the next Brutalist.


Grok, DeepSeek, Google all pivot to efficiency over IQ

Source: latent-space · published 2026-05-02

TL;DR

  • Grok 4.3 drops output pricing 60% and runs full eval suites for ~$395 vs GPT-5.5’s ~$4,000.
  • DeepSeek V4 Pro’s CSA/HCA attention cuts KV cache 90% and inference FLOPs to 27% at 1M context.
  • Independent testers flag “narcolepsy” in Grok’s long-horizon agent runs and reproducibility doubts on DeepSeek’s stack.
  • Google’s TPU 8t still trails Nvidia Vera Rubin ~1:3 per socket despite the 180% cost-perf marketing line.

The headline isn’t the conference, it’s the price collapse

Latent Space’s roundup is dressed up as a Wave 2 call for speakers for the AI Engineer World’s Fair, but the actual signal across this week’s drops is uniform: the frontier is competing on dollars-per-token, not Intelligence Index points. Every release in the digest — Grok 4.3, DeepSeek V4 Pro, ReaLM-Retrieve, TPU 8t/8i — earns its headline through an efficiency multiplier rather than a capability ceiling. None of them beats Claude Opus 4.7 (57) or GPT-5.5 (60) on raw IQ.

Grok 4.3: Pareto-optimal, narcoleptic

xAI’s 53 on the Intelligence Index is a 4-point bump, but the real news is the cost curve. Artificial Analysis puts Grok 4.3 on the intelligence/price Pareto frontier, with a full benchmark sweep running ~$395 versus ~$4,000 for GPT-5.5 7. Input prices fell 40%, output 60%, and GDPval-AA Elo jumped 321 points to 1500.

The dissent comes from agent-land. Andon Labs’ independent long-horizon eval surfaced what they called “narcolepsy” — the model sitting idle mid-trajectory instead of taking the next action — and flagged unusual verbosity that erodes the per-token discount when measured per completed task 8. Non-hallucination accuracy also dropped 8 points. The price story is real; the “Opus killer” framing is not.

DeepSeek V4 Pro: 90% less KV cache, hard to copy

The Latent Space numbers undersell DeepSeek’s efficiency play. Independent writeups put the alternating CSA/HCA attention design at a 90% KV-cache reduction and 27% of V3.2’s inference FLOPs at 1M-token context 9 — not the headline you’d expect from a 1.6T-parameter MoE.

The catch is reproducibility. Reviewers note that the FP4 Lightning Indexer plus interleaved HCA layers carry enough hyperparameter complexity that other labs may struggle to replicate the recipe without DeepSeek’s internal tooling, even with MIT-licensed weights 9. The companion “Thinking with Visual Primitives” work — embedding coordinate tokens directly into reasoning traces — is murkier still: DeepSeek briefly released, then deleted, the repo and tech report, and practitioners are arguing about “trigger-word dependency” and whether the maze-solving demos generalize 10.

Retrieval and silicon tell the same story

ReaLM-Retrieve is being sold on a 10.1% F1 lift over standard RAG, but the more interesting number is 47% fewer retrieval calls — 1.8 per MuSiQue question — establishing a new accuracy-per-call Pareto frontier 11. The caveat is operational: combining long reasoning traces with dense retrieved context strains consumer GPUs like the RTX 4090, so the technique deploys cleanly on datacenter hardware and awkwardly elsewhere 11.

Google’s TPU split follows the same pattern. The “180% training cost-performance” line is Google’s own; Tom’s Hardware pegs TPU 8t at roughly 1:3 against Nvidia’s Vera Rubin on per-socket compute, and notes Google chose HBM3e over the HBM4 in competing Nvidia/AMD parts — a deliberate yield-and-cost trade, not a performance win 12. The strategic move is the bifurcation into training (8t) and inference (8i) SKUs, plus a MediaTek second-source, both aimed at the volume needed to honor the Anthropic million-chip commitment.

What to take away

Independent voices line up the same way across all four announcements: extend the efficiency claim, puncture the reliability or reproducibility claim. If you’re picking models or chips this quarter, the right axis is cost-per-completed-task, not Intelligence Index — and the right discount rate on vendor benchmarks is steep.


Inference scaling won; o1-class runs now cost $2,800

Source: latent-space · published 2026-04-30

TL;DR

  • Nvidia says AI compute demand grew 1,000,000× in two years, driven by agents looping through reasoning instead of answering once 13.
  • The “thinking tax” is real: benchmarking o1 cost ~$2,800 versus $109 for GPT-4o on the same workload 14.
  • Apple researchers documented a “total collapse” of thinking models as problem complexity scales, calling the gains pattern-matching rather than reasoning 15.
  • DeepSeek-R1 already matches o1-preview on AIME (79.8% vs 79.2%) and Codeforces (96.3 vs 96.6 percentile) 16.

The thesis aged into consensus

Latent Space’s “Inference Inflection” essay argues that test-time compute has become the new scaling axis — models that pause, search, and verify are eating the gains pre-training used to deliver. By April 2026, that’s not a take; it’s the substrate. Jensen Huang told GTC 2026 that compute demand grew a million-fold in two years specifically because agents now loop through reasoning rather than emitting one answer 13. Hardware analyses peg inference at 80–90% of total AI system lifecycle cost, with the “memory wall” — moving KV-cache data between HBM and cores — now the dominant bottleneck 17. The o1 and Llama 3.1 405B references in the original essay read as period pieces precisely because the field absorbed them.

So the interesting question isn’t whether the inflection happened. It’s what the inflection broke.

Where reasoning still breaks

The essay’s headline number — AIME accuracy jumping from ~12% to >80% with chain-of-thought — is real but narrow. Apple’s mid-2025 study took thinking models past their comfort zone and watched them disintegrate:

thinking models excel at simple logic [but] experience a ‘total collapse’ as problem complexity increases, suggesting they rely on sophisticated pattern matching rather than generalizable formal reasoning 15

That finding maps onto the “overthinking” literature practitioners have been quietly cataloging: longer reasoning chains can rationalize wrong answers as easily as correct them. The gains hold where verifiers are cheap and ground truth is mechanical (math, competitive programming). They degrade fast on open-ended tasks where the model has to grade its own homework.

The thinking tax

The essay also glosses over the bill. Silicon Sands Studio’s “Thinking Tax” benchmark put o1 at ~$2,800 versus $109 for GPT-4o on the same workload — a ~25× premium — because a “concise” final answer is routinely backed by a 15,000-token hidden reasoning trace billed at output rates 14. That’s the operational consequence of the inflection: providers are quietly shifting toward compute-based pricing because token-based pricing can’t accommodate models that think for a minute before speaking. For anyone running reasoning models in production, the unit economics question — is this query worth the chain-of-thought it triggered? — is now the design constraint.

Open weights are erasing the moat faster than expected

The essay frames inference scaling as a frontier-lab capability. Six months later, that framing looks fragile. DeepSeek-R1 ships open weights that match o1-preview on AIME (79.8% vs 79.2%) and sit one-third of a percentile behind on Codeforces 16. Hugging Face’s Open-R1 released Mixture-of-Thoughts, a 350k-trace dataset of verified reasoning chains explicitly designed to let the community train models that emulate the “hidden” CoT process 18. The proprietary advantage in test-time reasoning is being distilled into open weights on a quarterly cycle.

Net

The “Inference Inflection” was the right call in 2024 and is the boring premise of 2026. The live debates have moved on: whether reasoning is reasoning or pattern-matching 15, whether per-query economics survive contact with production 14, and how long any lab can charge a premium for capability that ships open-weights two quarters later 1816.

Round-ups

[AINews] not much happened today

Source: latent-space

Latent Space’s daily AI News digest marks April 29 as a quiet day, with no major model releases, research drops, or industry announcements meriting a headline summary — a rare lull in the otherwise relentless 2026 AI news cycle.

Footnotes

  1. Screen Daily (SAG-AFTRA statement)https://www.screendaily.com/news/sag-aftra-speaks-out-over-ai-actress-tilly-norwood-creativity-should-remain-human-centred/5209482.article

    creativity is, and should remain, human-centered

  2. The Guardian — Cannes AI policy 2026https://www.theguardian.com/technology/2026/apr/26/cannes-ai-film-festival-raises-eyebrows-questions-future

    Festival President Iris Knobloch emphasized that a film must be a ‘personal vision’ rather than an ‘assembly of data,’ barring generative AI from Official Competition

  3. Consequence — ‘The Brutalist’ & ‘Emilia Pérez’ AI controversyhttps://consequence.net/2025/01/ai-oscars-2025-brutalist-emilia-perez/

    Respeecher was used to refine Adrien Brody and Felicity Jones’s Hungarian pronunciation… and to extend Karla Sofía Gascón’s vocal range — uses the Academy ruled ‘neither help nor harm’ a film’s chances

    2
  4. World of Reel — Val Kilmer / ‘As Deep as the Grave’https://www.worldofreel.com/blog/2026/5/1/oscars-ai-banned-from-acting-and-writing-categories

    Coerte Voorhees, director of As Deep as the Grave — which features a posthumous AI performance of Val Kilmer — defended his work as an artistic choice supported by the actor’s estate, regardless of its Oscar eligibility

    2
  5. Screen Daily — Berlinale AI disclosurehttps://www.screendaily.com/news/berlinale-asking-have-you-used-ai-of-submitted-films/5211259.article

    the Berlinale now requires filmmakers to disclose AI usage on submission forms, but has not implemented a ban — using the data to monitor how tools are used for budgeting, scheduling and post-production

  6. The Guardian — Tilly Norwood backlashhttps://www.theguardian.com/technology/2026/mar/26/tilly-norwood-ai-actor-creator-backlash-death-threats

    Van der Velden reported receiving death threats… defending the project as a ‘new paintbrush’ and a ‘deliberate provocation’ to force the industry to confront AI

  7. Artificial Analysishttps://artificialanalysis.ai/articles/xai-launches-grok-4-3-with-improved-agentic-performance-and-lower-pricing

    Grok 4.3 now sits on the ‘Pareto frontier,’ offering the best intelligence-to-cost ratio… running a full benchmark suite on Grok 4.3 costs roughly $395, compared to nearly $4,000 for GPT-5.5

  8. Towards AI (Medium) — independent agent evalhttps://medium.com/towards-artificial-intelligence/i-tested-grok-4-3-on-18-long-horizon-agent-tasks-the-10-cheaper-xai-model-embarrassed-opus-4-7-6dd9de45ecbc

    Andon Labs reported ‘narcolepsy’ issues in autonomous environments, where the model may sit idle instead of executing multi-step actions… noted for being extremely verbose, generating significantly more tokens than the average model during evaluation

  9. Medium / Data Science in Your Pocket — DeepSeek V4 deep divehttps://medium.com/data-science-in-your-pocket/deepseek-v4-released-253cb2bb81f6

    By alternating CSA and HCA layers, DeepSeek-V4-Pro achieves a 90% reduction in KV cache memory and requires only 27% of the inference FLOPs compared to V3.2 at 1M-token scale; some technical observers suggest the architecture’s hyperparameter complexity may be difficult for other labs to replicate without DeepSeek’s proprietary tooling

    2
  10. r/LocalLLaMA discussion of DeepSeek ‘Thinking with Visual Primitives’https://www.reddit.com/r/LocalLLaMA/comments/1szwi1d/deepseek_released_thinkingwithvisualprimitives/

    DeepSeek briefly released and then deleted the official repository and technical report shortly after publication… debate among experts regarding ‘trigger word dependency’ and the challenges of generalizing these ‘cyber finger’ techniques across diverse, fine-grained scenarios

  11. themoonlight.io — ReaLM-Retrieve paper reviewhttps://www.themoonlight.io/en/review/when-to-retrieve-during-reasoning-adaptive-retrieval-for-large-reasoning-models

    ReaLM-Retrieve requires 47% fewer retrieval calls — averaging just 1.8 calls per question on MuSiQue — but some independent researchers have raised questions regarding the practicality of certain integration methods on consumer-grade hardware (like the RTX 4090), where memory limits may hinder use of large reasoning traces alongside dense retrieved context

    2
  12. Tom’s Hardwarehttps://www.tomshardware.com/tech-industry/semiconductors/google-splits-its-tpu-into-two-chips-for-the-first-time-with-training-and-inference-variants

    TPU 8t lags behind Nvidia’s Vera Rubin in raw per-socket compute (roughly 1:3)… uses HBM3e instead of the next-generation HBM4 found in Nvidia and AMD competitors, likely to prioritize yield and cost-efficiency

  13. Forbes — Nvidia’s Trillion-Dollar Prediction (Apr 2026)https://www.forbes.com/sites/sanjitsinghdang/2026/04/20/nvidias-trillion-dollar-prediction-marks-ais-inflection-point/

    computing demand had increased by one million times over the preceding two years, driven by the shift from discrete queries to persistent agent loops

    2
  14. Silicon Sands Studio — ‘The Thinking Tax’https://siliconsandstudio.substack.com/p/the-thinking-tax-ais-real-cost-is

    benchmarking o1 cost nearly $2,800 compared to just $109 for GPT-4o, reflecting the sheer volume of hidden compute required for complex tasks … a seemingly concise answer can be backed by a 15,000-token reasoning chain

    2 3
  15. The Decoder (on Apple research)https://the-decoder.com/apple-study-finds-a-fundamental-scaling-limitation-in-reasoning-models-thinking-abilities/

    thinking models excel at simple logic [but] experience a ‘total collapse’ as problem complexity increases, suggesting they rely on sophisticated pattern matching rather than generalizable formal reasoning

    2 3
  16. Galileo AI — DeepSeek-R1 vs OpenAI o1https://galileo.ai/blog/deepseek-r1-vs-openai-o1-comparison

    DeepSeek-R1 matches or slightly exceeds o1-preview on mathematics (AIME 2024: 79.8% vs 79.2%) and coding (Codeforces: 96.3rd vs 96.6th percentile)

    2 3
  17. Medium — ‘How LLMs Break the Memory Wall in 2026’https://medium.com/@athuavjoshi11/the-speculative-escape-how-llms-break-the-memory-wall-in-2026-764133117c1e

    the ‘Memory Wall’ — the bottleneck where moving data from High Bandwidth Memory (HBM) to processor cores consumes more time and energy than the actual calculations … inference now accounts for 80–90% of total system lifecycle costs

  18. Hugging Face Open-R1 project pagehttps://huggingface.co/open-r1

    Open-R1 … released massive datasets, such as Mixture-of-Thoughts (350k verified traces), to enable community-driven training of models that emulate the ‘hidden’ chain-of-thought process

    2
Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare