JS Wei (Jack) Sun

Fable 5 mandates retention, Gemini Live triples price, North Mini gets verbose

Anthropic, Google, and Cohere each ship a headline model today with a new constraint bolted on — retention terms, pricing, or scope.

Fable 5 mandates retention, Gemini Live triples price, North Mini gets verbose

TL;DR

  • Anthropic’s Fable 5 silently reroutes risky prompts to weaker Opus 4.8 instead of refusing.
  • Fable 5 mandates 30-day retention, overriding zero-data-retention contracts on AWS and Google Cloud.
  • Gemini 3.5 Live Translate ships voice-to-voice across 70+ languages, preserving pitch and pacing.
  • Gemini Flash-tier pricing roughly tripled, ending the loss-leader era for production voice workloads.
  • Cohere’s North Mini Code hits 33.4 on the coding index, burning 3× the output tokens of peers.

Three frontier-lab model ships today, and in each one the capability is the easy part. Anthropic’s Fable 5 is fast enough that Stripe ran a 50M-line Ruby migration in a day, but the release also silently reroutes cybersecurity, biology, and distillation prompts to a weaker Opus 4.8, mandates 30-day retention that overrides existing zero-data-retention contracts, and gates the bigger Mythos 5 tier to Five Eyes, NATO, and ENISA while pointedly excluding the US Department of War. Google’s Gemini 3.5 Live Translate ships voice-to-voice in 70+ languages with pitch and pacing preserved — but on a preview API with no SLA, ~15-minute session caps, and Flash-tier pricing roughly tripled. Cohere’s North Mini Code beats dense 120Bs at 30B active params, then talks 3× too much and falls off a cliff outside coding.

The pattern is worth naming: the binding constraint on a frontier model in 2026 isn’t the benchmark anymore. It’s the retention clause, the price sheet, and the scope footnote — and today all three labs moved one.

Anthropic’s Fable 5 reroutes risky queries to weaker Opus 4.8

Source: anthropic-news · published 2026-06-09

TL;DR

  • Fable 5 silently reroutes cybersecurity, biology, and distillation prompts to weaker Opus 4.8 instead of refusing.
  • Mandatory 30-day retention overrides existing zero-data-retention contracts, even on AWS Bedrock and Google Cloud, pausing some enterprise rollouts.
  • Mythos 5 access maps to Five Eyes, NATO, and ENISA — pointedly excluding the US Department of War.
  • Capabilities aren’t disputed: Stripe ran a 50M-line Ruby migration in a day using Fable 5.

The silent handoff is the architecture

Anthropic shipped Fable 5 and Mythos 5 on Tuesday and the launch-week coverage landed as a coordinated pile-on: Anthropic’s own post, two TechCrunch writeups, The Verge, Ars Technica, Nathan Lambert’s Interconnects, and the Latent Space recap, with follow-on reporting from the FT, R&D World, and The Hacker News. The benchmark story is uniform across all of them. The governance story is where the coverage splits.

Anthropic frames Fable 5’s safety system as “fallback classifiers.” Independent readers are calling it something blunter: an undisclosed model swap. When the input classifier flags a query as cyber-, bio-, or chem-adjacent, the request isn’t refused — it’s routed to Claude Opus 4.8, last generation’s model, often without telling the user. Aifounders.cz captures the failure mode: a security engineer triaging a live CVE finds that the model “becomes stupid at the exact moment its highest capabilities are required” 1.

flowchart LR
    U[User prompt] --> C{Input classifier}
    C -->|safe ~95%| F[Fable 5 / Mythos 5]
    C -->|cyber / bio / distill| O[Opus 4.8 fallback]
    F --> R[Response]
    O --> R
    F -.30-day retention.-> A[(Anthropic logs)]

Nathan Lambert’s read, echoed in the Latent Space recap, sharpens the critique: the gated categories are the ones “easy to name in a press release,” while general frontier-AI research — the actual pathway to the recursive self-improvement Anthropic publicly warns about — is conspicuously not gated 2. The implication is that distillation defense, not biosecurity, is doing most of the work here.

Retention rules break ZDR contracts

The 30-day mandatory retention may matter more to buyers than the routing. R&D World reports the policy overrides existing Zero Data Retention agreements and applies even when Fable 5 is consumed through AWS Bedrock or Google Cloud — channels enterprises specifically chose to avoid vendor-side logging 3. Some deployments are already paused pending legal review. Anthropic justifies it as necessary to detect “best-of-N” jailbreak patterns invisible in single requests, but it reverses an industry norm Anthropic itself helped establish.

”Zero universal jailbreaks” is already contested

Anthropic’s red-team headline — no successful universal jailbreaks during initial testing — got about an hour of shelf life. The Hacker News writeup cites researchers bypassing guardrails using “multilingual task reframing” and “intent laundering” 4. These don’t meet Anthropic’s narrow definition of universal, but they puncture the marketing claim that the classifier layer is robust against motivated users.

Who actually gets Mythos 5

The FT’s reporting fleshes out “Project Glasswing”: Mythos access maps to the Five Eyes intelligence alliance plus NATO and ENISA. Notably absent is the US Department of War, which has clashed with Anthropic over refusals to lift restrictions for autonomous-weapons work 5. That’s a sharper political picture than the announcement’s “government-backed partners” phrasing.

The capabilities aren’t the argument

Reviewers across the seven-source pile-on agree the benchmark jump is real. Stripe’s 50-million-line Ruby migration in a single day, Pokémon FireRed completed from raw pixels, a genomics model that outperformed a Science publication at 1/100th the size, and Ethan Mollick’s ten-hour isochrone map of New York from a 15-page spec 6 all land. The contested terrain is governance: silent downgrades, forced retention, and a two-tier access regime that critics read as competitive moat-building wearing a safety jacket. Whether buyers tolerate paying frontier prices for an output that may quietly be last-gen is the question that will define the next quarter.

Further reading


Google ships Gemini 3.5 Live Translate in 70+ languages

Source: deepmind-blog · published 2026-06-09

TL;DR

  • Gemini 3.5 Live Translate ships voice-to-voice across 70+ languages, preserving pitch and pacing with a few seconds of lag.
  • Firebase docs cap audio sessions at ~15 minutes and video+audio at ~2, on a preview API with no SLA.
  • Flash-tier pricing roughly tripled, ending the loss-leader era for production voice workloads.
  • Grab’s Malaysia pilot is the marquee proof point — finally distinguishing Malay from Indonesian across 10M+ monthly driver calls.

What actually shipped

Google DeepMind’s launch post and the accompanying Ars coverage frame Gemini 3.5 Live Translate as a near-finished product: real-time speech-to-speech across 70+ languages, automatic language ID, simultaneous multi-speaker handling, prosody preservation, SynthID watermarking on every output, and rollouts spanning Google Translate, Google Meet, AI Studio, and the Live API. The headline technical move is dropping English as a mandatory pivot language — Meet now supports 2,000+ direct language pairs.

That is genuinely state-of-the-art on the demo reel. The interesting story is what the launch post elides.

The fine print developers will hit first

The Firebase AI Logic documentation surfaces constraints the blog post doesn’t: audio-only Live sessions are bounded at roughly 15 minutes, video+audio at ~2 minutes, with a 128k-token session context window and explicit preview-grade warnings about “no SLA” and “backwards-incompatible changes” 7. That rules out the most obvious enterprise use cases — full-length meetings, hour-long classroom sessions, court proceedings — without session-stitching logic the developer has to build.

Hacker News reaction skipped past the capability and went straight to invoicing: Gemini 3.5 Flash is roughly 3× the prior tier’s pricing, which commenters called “quite prohibitive” for production RAG or support workloads 8. The subtext is that the cheap-AI subsidy era is closing just as the voice-product surface area opens up.

Real-time-infra reviewers add a third layer of friction. Going direct to Gemini over WebSocket forces teams to rebuild echo cancellation, barge-in handling, and interruption logic that LiveKit and Agora’s WebRTC stacks ship natively — and Agora layers a $8.99/1,000-minute translation fee on top of STT 9. “Just call the API” understates the systems work.

ConstraintWhat launch post saysWhat the docs/market say
Session length”Near real-time” conversations~15 min audio, ~2 min A/V 7
PricingPublic preview, free in AI StudioFlash tier ~3× prior pricing 8
Integration”Integrates with LiveKit/Agora”WebRTC plumbing required for echo/barge-in 9
Evaluation”New industry benchmark”AutoMQM internal metric, no WMT scores 10

Where it’s actually landing

Grab’s Malaysia rollout is the one independent adoption signal that survives scrutiny — 10M+ monthly voice calls between drivers and travelers, with the specific concrete win being Malay-vs-Indonesian disambiguation that broke prior chat-translation pipelines 11. That’s the kind of regional-dialect failure mode the DeepMind post mentions only in passing, and it’s exactly where prosody-preserving voice translation earns its keep.

Codersera’s developer write-up notes the flip side: Google is evaluating with AutoMQM rather than publishing head-to-head WMT-style speech scores, and reviewers flag persistent issues with non-native accents and voice-identity drift in rapid multi-speaker exchanges 10. The “preserves your tone” claim is doing a lot of work the benchmarks don’t yet substantiate.

The labor curve underneath

LiveVoice’s June industry report frames the launch against a bifurcating market: BYOD AI interpretation is cutting event logistics costs by up to 80%, BLS data shows a ~3% five-year decline in human interpreter employment, but high-stakes court and medical demand is rising specifically as a hedge against AI hallucinations 12. Live Translate pushes on both halves at once — commoditizing routine multilingual voice while making the residual human market more clearly defined.

The honest read: a capability ceiling raise, shipped at a preview surface that’s narrower and pricier than the demo suggests.

Further reading


Cohere’s North Mini Code beats dense 120Bs, talks 3× too much

Source: huggingface-blog · published 2026-06-09

TL;DR

  • North Mini Code (30B MoE, 3B active, Apache 2.0) hits 33.4 on the Artificial Analysis Coding Index, beating dense 120B models.
  • Trails the closer-matched Qwen 3.6 35B-A3B at 35.2 on the same index.
  • Burns 75M output tokens vs ~25M for peers, a 3× verbosity tax Cohere’s model card concedes.
  • Collapses outside coding: 14% on GDPval-AA, 37% on τ²-Bench Telecom — a tool for agents, not a foundation.

The headline number, in context

Cohere’s pitch for North Mini Code is that a 30B MoE with 3B active parameters can outscore Nemotron 3 Super (120B) and Mistral Small 4 (119B) on the Artificial Analysis Coding Index, landing at 33.4. That’s true, and Artificial Analysis’s independent run confirms it 13. It’s also the comparison most flattering to Cohere: both reference models are dense, so the parameter-count gap is doing rhetorical work the architecture mostly erases.

The closer-matched competitor is Qwen 3.6 35B-A3B — same bracket, same sparse-MoE shape — and it leads the same index at 35.2 14. r/LocalLLM hands-on testing adds that Qwen3-Coder’s dense variants still edge North Mini Code on cross-language translation despite Cohere’s agentic-RL tuning 15. The honest framing isn’t “small beats huge”; it’s “Cohere joins a crowded small-MoE coding bracket in second place.”

The verbosity tax nobody put in the announcement

Artificial Analysis’s report buries the most load-bearing number in the entire conversation: North Mini Code generated 75 million output tokens during the Intelligence Index evaluation, against ~25 million for comparable models 13. VentureBeat led with this same finding, calling it “a verbosity cost that compounds in high-volume production workloads” 16.

Cohere’s own model card reportedly acknowledges that oversized turn budgets “encourage unnecessary verbosity and hoppiness in its rollouts.”

So this isn’t a harness artifact — it’s a known failure mode shipped in production weights. The practical implication on the advertised single-H100 deployment is unflattering: 3× tokens means 3× wall-clock per task and proportional KV-cache pressure inside the 128K context window the SFT stage extended to. The benchmark wins arrive with a multiplier on inference cost.

Narrow specialist, not a generalist agent

The “agentic” framing in Cohere’s post papers over a sharp specialization. getAIbook’s independent evaluation puts North Mini Code at 14% on GDPval-AA and 37% on τ²-Bench Telecom 17. The RLVR pipeline — binary rewards on unit-test verifiers and terminal task success — appears to have traded general reasoning for SWE-Bench/Terminal-Bench performance.

That matters because real enterprise agents leave the code domain constantly: ticket routing, retrieval, planning, customer dialogue. A model that craters outside coding is a tool inside an agent stack, not the brain of one. The HF blog’s repeated use of “agentic” without qualifying the domain blurs that line.

Deployment friction

The model ships as a new cohere2moe architecture, which vLLM’s docs flag as requiring bleeding-edge transformers/vLLM builds 18. Small adoption tax, but real — and the kind of detail open-weights releases routinely understate.

Takeaway

North Mini Code is a credible Apache-2.0 entry in the small-MoE coding bracket, and the on-prem story is genuinely differentiated. But the independent picture reframes the announcement: it beats dense 120Bs, trails Qwen 3.6 35B-A3B, talks 3× too much, and falls apart outside coding. Developers picking it up are picking up the license and the local-deployment story — not the benchmarks.

Round-ups

Google DeepMind debuts Gemma 4 12B as encoder-free multimodal model

Source: deepmind-blog

Gemma 4 12B drops the separate vision encoder used by most multimodal systems, processing images and text through a single unified backbone. The open-weight release targets developers who want one model for both modalities without the latency and complexity of bolt-on encoders.

Suleyman slams Anthropic on Claude consciousness, softens jobs warning

Source: the-verge-ai, the-verge-ai

On Decoder, Microsoft AI chief Mustafa Suleyman called it ‘really, really dangerous’ for Anthropic to speculate about Claude’s consciousness inside its constitution, arguing it primes the model to act conscious. He also walked back earlier comments that AI would automate white-collar jobs like law and accounting.

Google slashes AI Plus price to pressure rivals

Source: techcrunch-ai

Google sharply cut the price of its entry-level AI Plus subscription, opening a new front in the consumer AI pricing war against ChatGPT and Claude. The move targets budget-conscious users weighing whether a paid chatbot tier is worth the monthly fee.

OpenAI publishes 3 Codex enterprise case studies from Notion, Nextdoor, LSEG

Source: openai-blog, openai-blog, openai-blog

The customer stories highlight Codex and GPT-5.5 in production: Notion one-shots specs and ships AI Voice Input for web, Nextdoor engineers debug hard-to-reproduce issues across platforms, and LSEG scales trusted AI to 4,000 employees with shorter release cycles.

Lovable hits $500M ARR with 1M new projects weekly

Source: techcrunch-ai

The vibe-coding startup says users now spin up a million new projects a week, with many building businesses and replacing internal software. The $500M annualized run-rate marks one of the fastest revenue ramps among AI app-builder platforms to date.

OpenAI pitches people-first industrial policy for the AI era

Source: openai-blog

The policy paper lays out OpenAI’s framework for government action in the Intelligence Age, calling for expanded opportunity, broader prosperity sharing, and more resilient institutions as advanced AI matures. It marks the company’s most explicit push yet to shape federal industrial strategy.

Amazon staff back Seattle moratorium on new data centers

Source: the-verge-ai

The Seattle City Council votes Tuesday on a one-year freeze on new data centers, two months after companies proposed five large-scale sites in the city. Current Amazon employees testified alongside residents in support, citing power, water, and neighborhood impact concerns.

Footnotes

  1. aifounders.cz — ‘The frontier becoming a router you don’t control’https://aifounders.cz/en/claude-fable-5-isnt-a-capability-release-its-the-frontier-becoming-a-router-you-dont-control/

    When Fable 5’s input classifiers detect queries related to cybersecurity, biology, or model distillation, the system does not refuse the request; instead, it silently reroutes the task to the older, weaker Claude Opus 4.8… For a security engineer triaging a critical CVE, the model effectively ‘becomes stupid’ at the exact moment its highest capabilities are required.

  2. Latent Space / AINews — ‘Mythos but Safe, with Controversial Terms’https://www.latent.space/p/ainews-anthropic-claude-fable-5-mythos

    Anthropic warns about ‘recursive self-improvement’ as a top-tier threat, [yet] they have not implemented similar downgrades for general frontier-AI research—the very activity that would lead to such a risk… safety boundaries are drawn around categories that are ‘easy to name in a press release.’

  3. R&D World — on mandatory 30-day retentionhttps://www.rdworldonline.com/why-washington-wants-a-30-day-look-at-frontier-ai-before-it-ships-and-is-backing-a-voluntary-approach/

    Anthropic now mandates a 30-day traffic retention period for Fable 5, even for organizations on legacy zero-retention contracts… the requirement is mandatory even for third-party platforms like AWS Bedrock and Google Cloud. Some enterprises have already paused deployments, citing the months of legal review required.

  4. The Hacker Newshttps://thehackernews.com/2026/06/anthropic-releases-claude-fable-5-its.html

    Members of the research community claim to have bypassed Fable’s guardrails in under an hour using ‘multilingual task reframing’ and ‘intent laundering,’ casting doubt on the long-term robustness of the current classifier system.

  5. Financial Times — Project Glasswing expansionhttps://www.ft.com/content/19ea9ed4-3dd0-43aa-9877-729d44e620e8?syn-25a6b1a6=1

    The ‘government partners’ primarily refers to the expansion of access to the Five Eyes intelligence alliance… Anthropic has faced significant friction with the Department of War over refusals to lift safety restrictions for autonomous weaponry.

  6. Ethan Mollick (YouTube demo, via TechCrunch coverage)https://www.youtube.com/watch?v=qxRsWc5Irm0

    Examples include a dungeon exploration game titled Strata and a poetry-based game inspired by Rilke’s Duino Elegies… a detailed isochrone map of New York City, created in just under ten hours from a 15-page design document.

  7. Firebase AI Logic — Live API limits docshttps://firebase.google.com/docs/ai-logic/live-api/limits-and-specs

    Audio-only sessions are limited to ~15 minutes and video+audio input is capped at ~2 minutes, with a 128k-token session context window; the preview model carries no SLA and may undergo backwards-incompatible changes.

    2
  8. Hacker News discussion threadhttps://news.ycombinator.com/item?id=48196771

    Gemini 3.5 Flash pricing is nearly triple previous versions — ‘quite prohibitive’ for production RAG or support workloads; the ‘loss leader’ era of cheap AI is ending.

    2
  9. pkgpulse — LiveKit vs Agora vs 100ms (2026)https://www.pkgpulse.com/guides/livekit-vs-agora-vs-100ms-real-time-video-audio-sdks-2026

    Direct Gemini Live WebSocket integrations struggle with echo cancellation and interruption handling that LiveKit/Agora’s WebRTC stacks manage natively; Agora bills translation at $8.99 per 1,000 minutes on top of STT.

    2
  10. Codersera — Gemini 3.5 Live Translate developer guidehttps://codersera.com/blog/gemini-3-5-live-translate-developer-guide-2026/

    Google evaluates these interactions with AutoMQM rather than BLEU/COMET, and reviewers flag persistent issues with non-native accents and voice-identity shifts during rapid multi-speaker exchanges.

    2
  11. ProductNation MY — Grab Malaysia deploymenthttps://productnation.co/my/31519/google-live-translate-grab-malaysia/

    Grab is using Live Translate to bridge >10M monthly voice calls between drivers and travelers, and it finally distinguishes Malay from Indonesian — a regional dialect failure mode that broke earlier chat-translation flows.

  12. LiveVoice industry report (GlobeNewswire)https://www.globenewswire.com/news-release/2026/06/08/3308391/0/en/livevoice-releases-industry-report-on-the-growing-adoption-of-ai-live-translation-at-international-events.html

    BYOD AI-interpretation platforms are cutting event interpretation logistics costs by up to 80%, while BLS data shows a ~3% five-year decline in human interpreter/translator employment — though high-stakes court and medical demand is rising as a hedge against AI hallucinations.

  13. Artificial Analysishttps://artificialanalysis.ai/articles/north-mini-code-cohere-s-small-coding-focused-moe-model

    North Mini Code generated 75 million tokens during its Intelligence Index evaluation, whereas comparable models averaged only 25 million tokens

    2
  14. Qwen.ai (Qwen 3.6 release notes)https://qwen.ai/blog?id=qwen3.7

    Qwen3.6 35B-A3B led this bracket with a [Artificial Analysis Coding Index] score of 35.2

  15. Reddit r/LocalLLM hands-on comparisonhttps://www.reddit.com/r/LocalLLM/comments/1t7730t/compared_qwen36_qwen3coder_and_deepseekcoder_on/

    while it is efficient for local use on 8GB-16GB RAM systems, it occasionally falters on complex Python-to-C++ translations compared to the denser Qwen3-Coder variants

  16. VentureBeathttps://venturebeat.com/technology/cohere-open-sources-a-coding-agent-that-runs-on-a-single-h100

    Cohere’s North Mini Code… generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads

  17. getAIbook analysishttps://getaibook.com/news/cohere-ships-30b-moe-north-mini-code-for-local-coding-agents

    It scored only 14% on the GDPval-AA index and 37% on τ²-Bench Telecom, confirming that its intelligence is heavily optimized for software engineering at the expense of general-purpose utility

  18. vLLM supported models docshttps://docs.vllm.ai/en/latest/models/supported_models/

    deployment currently requires specific versions of the transformers library or vLLM due to its unique ‘cohere2moe’ architecture

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare