JS Wei (Jack) Sun

Beth Israel hedges o1-preview, KC Green fights Artisan, DualShot audits Claude

A medical reasoning study, an ad-campaign theft, and a #1 app: each AI win today gets qualified by the person closest to it.

Beth Israel hedges o1-preview, KC Green fights Artisan, DualShot audits Claude

TL;DR

  • o1-preview scored 89% on ER management reasoning versus 34% for physicians across 76 Beth Israel cases.
  • The study’s own authors oppose clinical deployment without RCTs, citing harmful test orders alongside correct diagnoses.
  • KC Green accuses Artisan of using his ‘This is Fine’ dog in a billboard without permission.
  • Forensic analysis points to manual Photoshop on the Artisan ad, not AI generation.
  • DualShot Recorder hit #1 paid app in 20 countries on heavily audited Claude-built code.

Three AI stories cross the desk today, and each one carries the same shape: the headline holds, and so does a built-in puncture from someone who’d know. A new Beth Israel study hands o1-preview a 55-point lead over ER physicians on management reasoning — and the study’s own authors oppose clinical deployment without RCTs, citing the harmful test orders bundled with the correct diagnoses. Artisan’s billboard campaign trades on AI-aesthetic provocation; a forensic readout suggests the actual image is hand-Photoshopped from KC Green’s webcomic, and Green is calling on fans to deface it wherever it appears. DualShot Recorder topped the App Store in 20 countries on Claude-built code — and its developer is at pains to say he triple-audited every line, while iOS 26’s native Dual Capture on iPhone 17 is already poised to collapse the casual market he just won.

o1-preview hits 89% on ER management vs 34% for doctors

Source: techcrunch-ai · published 2026-05-03

TL;DR

  • o1-preview scored 89% on management reasoning vs. 34% for ER physicians across 76 Beth Israel cases.
  • A companion JAMA paper found LLMs fail >80% of differential diagnoses when chart data is sparse.
  • Microsoft’s MAI-DxO hit 85.5% on NEJM cases vs. ~20% for unassisted physicians.
  • The study’s own authors oppose deployment without RCTs, citing harmful test orders alongside correct diagnoses.

The numbers

The Harvard/BIDMC team ran OpenAI’s o1-preview against two attending ER physicians on 76 real Beth Israel emergency cases. The model beat the doctors at every stage: 67.1% vs. 50–55% at triage, 81.6% vs. 70–79% at admission, and — most strikingly — 89% vs. 34% on management reasoning when physicians were limited to conventional search tools 1. Microsoft’s concurrently published MAI-DxO orchestrator pushed the ceiling higher still, scoring 85.5% on NEJM sequential-diagnosis vignettes where unassisted physicians cleared roughly 20% 2. Three labs, same direction.

The companion study nobody quoted

The TechCrunch headline omits the strongest counterweight, which landed in JAMA Network Open the same week. Mass General Brigham’s PrIME-LLM benchmark shows LLMs clear 90% accuracy when handed complete charts but blow more than 80% of differential diagnoses when information is sparse 3. That’s not a footnote — it’s the exact failure mode of emergency triage, where a clinician has minutes, vital signs, and a story, not a finished EHR.

The Harvard cases were retrospective text extracts. Even the “triage stage” inputs almost certainly contained more structured information than a doctor sees walking into the bay. Gizmodo’s writeup pushes the same point harder: o1-preview never saw a patient’s pallor, work of breathing, or level of distress — cues that drive real triage decisions 4.

StudySettingAI scorePhysician score
Harvard/BIDMC (triage)76 ER cases, text-only67.1%50–55% 1
Harvard/BIDMC (mgmt)Same89%34% 1
Microsoft MAI-DxONEJM cases85.5%~20% 2
MGB PrIME-LLMSparse-data differentialsfails >80% of cases3

The caveats the authors put louder than the press release

Lead authors Adam Rodman and Peter Brodeur are explicit: o1-preview can land the correct top diagnosis while simultaneously ordering tests that expose patients to avoidable risk and cost 1. The Science Media Centre’s expert panel reads the result not as a replacement signal but as a mandate for a “triadic” doctor-patient-AI workflow with the model as a supervised teammate 5. No RCT has yet measured patient outcomes from autonomous use, and automation-bias and accountability gaps remain unaddressed.

What deployment actually looks like

The closest thing to outcome evidence comes from outside academia. OpenAI and Penda Health ran a GPT-4o-based “AI Consult” tool across nearly 40,000 primary-care visits in Kenya and reported a 16% reduction in diagnostic errors and 13% reduction in treatment errors versus control 6. Different setting, lower physician baseline — but it’s the only large-N evidence that benchmark gains translate into fewer errors in production.

The honest reading of the Harvard study: o1-preview is unambiguously good at structured medical reasoning on retrospective text, three independent results now point the same way, and the people closest to the data are the ones telling you not to skip the trial.


KC Green says Artisan stole his ‘This is Fine’ dog for ad

Source: techcrunch-ai · published 2026-05-03

TL;DR

  • KC Green says Artisan used his “This is Fine” dog in a billboard without permission.
  • Green is calling on fans to vandalize the billboards wherever they appear.
  • Forensic analysis suggests manual Photoshop, not AI generation — a “double mouth” tell on the dog.
  • Furie v. Infowars (2019) already gutted the “it’s a meme now” copyright defense.

What Green actually said

The TechCrunch write-up paraphrases Green politely. His Bluesky post does not: “it’s been stolen like AI steals. please vandalize it if and when you see it” 7. That’s the artist of one of the most-reproduced images on the internet telling his audience to deface corporate property — a sharper line than the headline conveys, and one Artisan’s PR team will have to answer for separately from the underlying infringement claim.

It probably isn’t even AI

The framing of this as an “AI training” story collapses on inspection. Independent write-ups of the billboard image identify it as a modified version of Green’s original 2013 comic, with anatomical tells — including a “double mouth” on the dog — that read as sloppy human Photoshop rather than diffusion-model output 8. If that holds, Artisan’s alleged offense is the most boring kind of IP violation: a designer pulled an image off Google and edited it for a paid ad. The AI branding is marketing, not method.

The obvious template is Matt Furie’s 2019 settlement with Infowars over Pepe the Frog: $15,000 — slightly more than Infowars’ $14,000 in poster profits — plus destruction of remaining inventory and a permanent ban on Pepe merchandise 9. More consequential than the dollar figure was Judge Fitzgerald’s pre-settlement ruling, which explicitly rejected the “meme-ification” defense and found “no legal authority” for the claim that virality erodes a creator’s copyright 10.

That ruling matters here because it forecloses Artisan’s most natural defense. A 2026 survey of AI copyright litigation notes that while training-data fair-use arguments have generally hardened in defendants’ favor, “direct use of a copyrighted character in a commercial advertisement is a clear-cut violation” sitting entirely outside the training-data debate 11. Artisan’s reported “we have respect for KC Green” posture reads as preemptive settlement language, because the law is not ambiguous and the comparable damages are small enough to write a check for.

The rage-bait is the product

This is Artisan’s second swing at orchestrated outrage. CEO Jaspar Carmichael-Jack told the SF Chronicle “You don’t draw eyes with boring messaging” and credited the original “Stop hiring humans” billboards with over $2 million in new annual recurring revenue, despite death threats and vandalism along the way 12. Read in that light, the Green dispute isn’t a mistake the company will quietly fix — it’s the campaign working as designed. Pay Green, destroy the billboards, keep the ARR.

The richer irony writes itself: a startup whose pitch is replacing human workers with AI has landed in legal trouble because a human apparently cut a licensing corner — and is now turning the resulting fight into more free distribution.


DualShot hits App Store #1 on heavily audited Claude code

Source: the-verge-ai · published 2026-05-03

TL;DR

  • DualShot Recorder hit #1 paid app in 20 countries within days of launch.
  • Built by a wildlife creator using Claude after ChatGPT kept fabricating iOS APIs.
  • Downey says he had to “double and triple audit every line” — not unsupervised vibe coding.
  • iOS 26’s native Dual Capture on iPhone 17 already collapses DualShot’s casual market.

The vibe-coding receipt

The Verge’s story is a feel-good creator profile. The dev-tools story is more interesting: a non-coder shipped the #1 paid iPhone app using Claude — and is also the most candid critic of how he got there. Downey told TechEchelon the model would routinely fabricate API behavior, forcing him to “double and triple audit every line” before anything compiled cleanly 13. That tracks with Andrej Karpathy’s own walk-back of “vibe coding” — he now prefers “agentic engineering” because unsupervised output is, in his words, “brittle” or “gross” 14.

So DualShot is not a vindication of one-shot prompting. It’s a data point on what AI-assisted shipping actually costs in 2026: a multi-month human-in-the-loop audit cycle for a single-feature app from someone who’d never written Swift. Still a remarkable productivity multiplier. Not the zero-engineer fantasy.

The technical move The Verge glosses

Filmic’s DoubleTake has done simultaneous multi-camera capture since 2019, but it’s hard-capped at 1080p because Apple’s AVCaptureMultiCamSession hardwareCost budget won’t accommodate 4K with two sensors active 15. DualShot’s actual trick is different: it reads one full-sensor stream and crops it to both 9:16 and 16:9 in software, sidestepping the multi-cam ceiling entirely. That’s the genuinely novel choice, and it’s the moat — not “two cameras at once.”

AppMethodMax res
Filmic DoubleTakeTwo sensors via multi-cam API1080p 15
iOS 26 Dual CaptureTwo sensors, native PiP/side-by-side 16not disclosed
DualShot RecorderOne sensor, dual crop4K

Apple commoditized the niche on the same launch cycle

PetaPixel’s hands-on flagged real v1 gaps the launch coverage skipped: 4K dual-stream causes overheating within minutes, there’s no composition grid, no tap-to-focus, no exposure lock 17. Meanwhile iOS 26 ships PiP and side-by-side recording directly in the stock Camera app on iPhone 17 16. For anyone on Apple’s current flagship, the casual use case is solved without a download.

DualShot’s defensible market is therefore narrower than the chart position suggests: pre-iPhone-17 owners who specifically want 4K multi-aspect output from a single lens. That’s a real niche. It’s not “every iPhone user.”

The monetization pivot nobody mentioned

FoxData’s app-analytics record shows the $6.99 one-time purchase quietly converted to a freemium model with Basic and Advanced subscription tiers within days of hitting the charts 18. Downey’s public framing leans hard on no-account, no-data-collection indie virtue — fine, and apparently true on the data side. But the pricing has converged on industry norm faster than the narrative has updated. For a dev-tools audience, that’s the second receipt: shipping with AI is cheap enough that you can iterate the business model in public, in the same week you crack the top 20.

Round-ups

AI music is flooding streaming services — but who wants it?

Source: the-verge-ai

The Verge’s Stepback newsletter examines the surge of generative AI tracks across Spotify and rival streamers, questioning listener demand as platforms grapple with attribution, royalty dilution, and detection. Columnist Terrence O’Brien frames the flood as an industry problem rather than a user-driven trend.

Footnotes

  1. Harvard Medical School news (Adam Rodman / Peter Brodeur)https://hms.harvard.edu/news/study-suggests-ai-good-enough-diagnosing-complex-medical-cases-warrant-clinical-testing

    a model might arrive at the correct top diagnosis but simultaneously recommend a testing plan that exposes patients to avoidable risks

    2 3 4
  2. Medical Economics on Microsoft MAI-DxOhttps://www.medicaleconomics.com/view/microsoft-says-its-ai-tool-outperforms-physicians-on-complex-diagnostic-challenges

    MAI-DxO correctly identified 85.5% of complex NEJM cases, more than four times the 20% accuracy achieved by unassisted physicians

    2
  3. Mass General Brigham press release (JAMA Network Open companion)https://www.massgeneralbrigham.org/en/about/newsroom/press-releases/ai-chatbot-lacks-clinical-reasoning

    LLMs achieve over 90% accuracy when provided with complete data, but fail to generate appropriate differential diagnoses more than 80% of the time when information is sparse

    2
  4. Gizmodo — ‘Don’t get all excited’https://gizmodo.com/ai-just-beat-doctors-at-diagnosing-er-patients-dont-get-all-excited-2000752676

    the AI was tested only on text-based records, ignoring visual and auditory cues such as a patient’s physical appearance or level of distress, which are critical in triage

  5. Science Media Centre expert reactionhttps://www.sciencemediacentre.org/expert-reaction-to-study-evaluating-performance-of-a-large-language-model-on-the-reasoning-tasks-of-a-physician/

    results do not justify replacing physicians; rather, they point to a ‘triadic care model’ where AI serves as a supervised teammate

  6. OpenAI / Penda Health deployment paper (Kenya)https://cdn.openai.com/pdf/a794887b-5a77-4207-bb62-e52c900463f1/penda_paper.pdf

    AI Consult led to a 16% reduction in diagnostic errors and 13% reduction in treatment errors across nearly 40,000 patient visits

  7. KC Green on Blueskyhttps://bsky.app/profile/kcg.bsky.social/post/3mkwcpx45q223

    i’ve been getting more folks telling me about this and it’s not anything i agreed to. it’s been stolen like AI steals. please vandalize it if and when you see it.

  8. Lazy Tech Talkhttps://www.lazytechtalk.com/news/ai-art-theft-artisans-this-is-fine-ad-exposes-ip-crisis

    the image was a ‘modified version’ of the 2013 comic … the dog featured anatomical errors like a ‘double mouth,’

  9. Vice (Pepe/Infowars precedent)https://www.vice.com/en/article/infowars-agrees-to-pay-pepe-the-frog-creator-dollar15000-in-copyright-settlement/

    Infowars agreed to pay Matt Furie $15,000 … exceeding the roughly $14,000 in profits Infowars had generated from the poster’s sales, and was required to destroy all remaining inventory

  10. Bloomberg Lawhttps://news.bloomberglaw.com/ip-law/pepe-settlement-is-missed-opportunity-for-fair-use-guidance

    Judge Fitzgerald rejected the argument that ‘meme-ification’ diminishes an author’s copyright … no legal authority to support the claim that an image becoming a meme destroys the creator’s ownership interest

  11. JD Supra — AI Trends for 2026https://www.jdsupra.com/legalnews/ai-trends-for-2026-copyright-litigation-6456388/

    plaintiffs are now required to demonstrate ‘substantial similarity’ between a specific AI-generated image and a protected work … direct use of a copyrighted character in a commercial advertisement is a clear-cut violation

  12. San Francisco Chroniclehttps://www.sfchronicle.com/tech/article/artisan-stop-hiring-humans-billboards-19976095.php

    CEO Jaspar Carmichael-Jack defended the provocative messaging: ‘You don’t draw eyes with boring messaging’ … the campaign drove over $2 million in new annual recurring revenue

  13. TechEchelon profile of Downeyhttps://www.techechelon.com/post/wildlife-creator-derrick-downey-jr-built-a-hit-iphone-camera-app-using-ai-coding-tools

    You would think… it would give you accurate data. But I found that not to be the case… I had to double and triple audit every line.

  14. CodeRabbit — semantic history of ‘vibe coding’https://www.coderabbit.ai/blog/a-semantic-history-how-the-term-vibe-coding-went-from-a-tweet-to-prod

    Karpathy has cautioned that AI-generated code can still be ‘brittle’ or ‘gross’ without human oversight, leading him to refine the concept into ‘agentic engineering’.

  15. DPReview on Filmic DoubleTakehttps://www.dpreview.com/news/8725087946/filmic-s-double-take-app-records-with-two-cameras-at-the-same-time

    Doubletake is restricted to 1080p resolution because Apple’s multi-camera API does not support 4K while running multiple sensors concurrently.

    2
  16. MacRumors — iPhone 17 Dual Capture how-tohttps://www.macrumors.com/how-to/iphone-17-dual-capture-video/

    iPhone 17 models can record video from the front and rear cameras simultaneously directly within the stock Camera app, with picture-in-picture or side-by-side layouts.

    2
  17. PetaPixel reviewhttps://petapixel.com/2026/03/30/dualshot-recorder-app-captures-vertical-and-horizontal-video-at-the-same-time/

    Recording two simultaneous video streams—especially at 4K resolution—causes iPhones to overheat rapidly, sometimes within minutes, and the initial version lacked a composition grid and manual controls for tap-to-focus or exposure.

  18. FoxData app analyticshttps://foxdata.com/en/app-marketing-analytics/6761159805/as/US/

    Within its first few days, the app reportedly reached the top of the paid charts in 20 countries… transitioned from a $6.99 one-time purchase to a freemium model with Basic and Advanced subscriptions.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare