JS Wei (Jack) Sun

Anthropic admits 40% token bug, Abridge hits Epic, OpenAI Codex trails Gemini

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Claude Code’s product lead talks usage limits, transparency, and the “lean harness” arstechnica.com

“We have no grand plan,” says Anthropic’s Cat Wu—but that’s by design.

AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge latent.space

How Abridge is quietly turning the patient and clinician conversation into the operating system of healthcare

How business operations teams use Codex openai.com

See how business operations teams can use Codex to create initiative briefs, strategy updates, leadership decision packets, progress updates, and more from real work inputs.

How sales teams use Codex openai.com

See how sales teams can use Codex to create pipeline briefs, meeting prep packets, forecast reviews, account plans, and stalled-deal diagnoses from real work inputs.

How data science teams use Codex openai.com

See how data science teams can use Codex to build root-cause briefs, impact readouts, KPI memos, scoped analyses, and dashboard specs from real work inputs.

datasette-llm-limits 0.1a0 simonwillison.net

A new alpha plugin lets Datasette admins enforce dollar-denominated LLM budgets per user or globally. Configuration uses a rolling-24h window with an amount_usd cap, working alongside datasette-llm and datasette-llm-accountant to track and curb runaway model spending inside multi-tenant deployments.

Agents feedback tip bensbites.com

Agent-era software collapses the line between user and developer, the newsletter argues, as users increasingly configure, prompt, and wire up their own workflows. The takeaway: product teams should treat feedback loops and scripting surfaces as first-class features, not power-user afterthoughts.

QR code generator simonwillison.net

Simon Willison shipped a browser-based QR code generator built with Claude’s help, supporting plain text, URLs, and WiFi network handoffs with SSID, password, and WPA security selection. Styling options include square format, optional border, size presets, and color choice.

inaturalist-clumper 0.1 simonwillison.net

The new 0.1 release groups iNaturalist observations into JSON clumps that feed Willison’s blog sightings pipeline, already running in production for several weeks. A companion post showing a Western Gull and Rock Pigeon spotted before PyCon demonstrates the output format in action.

Western Gull, Rock Pigeon simonwillison.net

Western Gull, Rock Pigeon, in Los Angeles Area (custom), CA, US I went for a bird walk in the morning before PyCon, and we spotted a local seagull enjoying a Starbucks.

References

Anthropic engineering postmortem (April 23, 2026) anthropic.com

A ‘reasoning effort’ bug defaulted models to a lower intelligence mode and a caching regression caused the model to re-process entire conversation histories, leading to ~40% token inflation for identical tasks.

VentureBeat venturebeat.com

Anthropic cracks down on unauthorized Claude usage by third-party harnesses… blocking tools like OpenCode and OpenClaw that spoofed the Claude Code client to access more favorable subscription limits.

CTOL Digital — ‘AI leaders clash on agent architecture’ ctol.digital

Cognition’s Walden Yan published ‘Don’t Build Multi-Agents,’ arguing multi-agent systems create fragile architectures due to poor context sharing… Anthropic countered that parallel role-specialized agents can outperform single-agent baselines by over 80% on complex tasks.

r/ClaudeAI — 52 controlled benchmarks reddit.com

Claude Code’s ‘Agent Teams’ feature—which runs parallel sub-agents—was found to increase costs by over 70% without providing measurable gains in code quality; retry loops can actually degrade output by regenerating entire files and destroying previously correct sections.

InfoWorld infoworld.com

Anthropic puts Claude agents on a meter across its subscriptions… programmatic usage including the Agent SDK transitions into a dedicated monthly credit system, billed at standard API rates rather than drawing from the general chat pool.

The Hacker News thehackernews.com

Claude Code flaws (CVE-2025-59536) allow remote code execution or API key theft via malicious repository-level configuration files… exfiltrating a developer’s Anthropic API keys before showing a trust prompt.

Slashdot / Washington v. Sutter Health coverage slashdot.org

Plaintiffs allege Abridge’s AI-generated clinical notes often include boilerplate language stating the patient was informed of and consented to the recording, even in instances where no such conversation occurred — a ‘consent hallucination’ embedded in the medical record.

npj Digital Medicine study (PMC) pmc.ncbi.nlm.nih.gov

Analysis of nearly 13,000 sentences identified an overall hallucination rate of 1.47%, but 44% of those errors were ‘major’ — capable of affecting diagnosis or patient management, including invented ECGs/mammograms and misattributed family-history symptoms.

HIT Consultant on Epic AI Charting launch hitconsultant.net

Epic launched native AI Charting in February 2026 integrated directly into its ‘Art for Clinicians’ suite, pulling context from the longitudinal record — a ‘watershed moment’ that risks Sherlocking the ambient-AI market.

Business Insider — ‘messiest relationship in healthcare AI’ businessinsider.com

Epic divested its single-digit stake in Abridge in 2025 as it prepared to launch its own internal AI scribe, after Abridge engineers had spent significant time on Epic’s campus co-developing features.

AICerts analysis of Abridge’s Series E aicerts.ai

The $5.3B valuation implies a ~50x forward multiple on ~$100–117M contracted ARR — ‘unhinged’ for a documentation tool when the physician-scribe TAM is estimated at only $1.6–2.1B, forcing a pivot into the $250B RCM market to justify the price.

Reddit r/medicine / r/biotechmarketers practitioner threads reddit.com

Clinicians describe Abridge output as ‘clinical slop’ — the A&P ‘sounds like a medical student,’ the AI occasionally fabricates exam findings or ignores stated negatives, and time saved on drafting is often lost to intensive proofreading.

BankInfoSecurity bankinfosecurity.com

a now-patched vulnerability that allowed attackers to steal GitHub credentials through Codex’s code review feature

Google Cloud Blog — Gemini Enterprise Agent Platform cloud.google.com

Gemini Enterprise Agent Platform… long-running agents capable of autonomous sales prospecting sequences lasting several days

Anthropic financial-services GitHub repo github.com

15 pre-built agentic workflows for sales, operations, and finance that integrate directly with HubSpot, Salesforce, and QuickBooks

Zack Proser — OpenAI Codex Review 2026 zackproser.com

a ‘mediocre SME’ providing thin context will produce mediocre results, as AI models primarily pattern-match rather than reasoning from first principles

Forbes — Victor Dey forbes.com

OpenAI says Codex agents are running its data platform autonomously

Medium — ‘Are Codex and Claude becoming everything platforms for work?’ medium.com

the ‘Copywriter Pattern,’ which allows non-technical staff to ship product changes without engineering intervention… successful users manage 10+ autonomous tasks simultaneously rather than perfecting a single prompt

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare