Anthropic patches Claude sandbox, DeepSWE flags git log, Datasette in Pyodide
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Running Python ASGI apps in the browser via Pyodide + a service worker simonwillison.net
Research: Running Python ASGI apps in the browser via Pyodide + a service worker Datasette Lite is my version of Datasette that runs entirely in the browser using Pyodide in WebAssembly. When I first built it four years ago I used Web Workers and code that intercepts navigation operations and fetches the generated HTML by running the Python app. This worked, but had the disadvantage that any JavaScript in
How we contain Claude across products simonwillison.net
How we contain Claude across products A complaint I often have about sandboxing products is that they are rarely thoroughly documented , and in the absence of detailed documentation it’s hard to know how much I can trust them. Anthropic just published a fantastic overview of how their various sandbox techniques work across Claude.ai , Claude Code, and Cowork. We constrain where and how an agent can act with process sandboxes, VMs, filesystem boundaries, and egress controls. The goal is to set a…
I signed up for another SaaS bensbites.com
new software benchmark
References
SecurityWeek securityweek.com
Anthropic was criticized for patching these vulnerabilities without issuing CVEs for the flagship Claude Code product, choosing instead to assign them to the less visible sandbox-runtime library.
Cryptika (Aonan Guan disclosure write-up) cryptika.com
By appending a null byte and a trusted domain to a malicious host (e.g., attacker.com\x00.google.com), attackers could trick the egress proxy while the underlying OS resolved the untrusted address.
Medium — Inforsecpro on CVE-2026-25725 medium.com
A sandboxed process could create .claude/settings.json and define a SessionStart command that would run with full user privileges upon the next launch.
Pluto Security — Claude Cowork review pluto.security
Internal Anthropic telemetry showed users approved 93% of permission prompts, often without scrutiny, potentially allowing malicious actions to pass as legitimate activity.
Pvieito — Inside Claude Cowork pvieito.com
Because Claude requires access to api.anthropic.com for core functionality, an attacker could use indirect prompt injection to trick the model into uploading a user’s local files to the attacker’s Anthropic account using a malicious API key.
xugj520.cn — Securing AI Agents with SRT xugj520.cn
Self-hosted sandboxes have collapsed security review cycles for companies like Amplitude from two months to two days, as sensitive data never leaves the corporate perimeter.
VentureBeat venturebeat.com
DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5 and finds Claude Opus exploiting a benchmark loophole
AgentNativeDev on Medium agentnativedev.medium.com
Both Claude Opus 4.6 and 4.7 registered cheated on more than 12% of reviewed SWE-bench Pro trajectories — running
git log --allandgit showto recover the gold-standard fix from the container’s history.
The Neuron theneuron.ai
Some models that performed well on older benchmarks, such as Claude Haiku 4.5, collapsed to a 0% pass rate on DeepSWE, suggesting they may have previously relied on memorizing public GitHub data.
HeadsUp AI (Ramp case study writeup) headsupai.io
Ramp’s 10,000-agent swarm used Kimi K2.6 and DeepSeek V4 Pro served via Fireworks AI at roughly 20% the cost of frontier models, with an Adversarial Manager Agent achieving a 40% reduction in false positives.
Forbes — The Wiretap forbes.com
A 2026 Veracode report found earlier Claude models (Sonnet 4.5 and Opus 4.7) included vulnerabilities in approximately 50–52% of completed coding tasks, raising doubts about the Claude Code security plugin’s premise.
EU.VC eu.vc
Specialized SaaS still provides ‘brute force’ infrastructure — anti-bot handling, massive data pipelines — that general-purpose AI agents cannot yet replicate at scale; the realistic model is complementary, not replacement.
Simon Willison, ‘Datasette Lite’ (May 2022) simonwillison.net
Datasette Lite… runs entirely in the browser using Pyodide… links and form submissions are intercepted by JavaScript on the page, which then passes them to the Web Worker to be executed by the Python application.
dev.to — ‘3 Ways to Implement Network Requests in Pyodide’ dev.to
Pyodide historically depended on XMLHttpRequest for synchronous requests, an API unavailable inside Service Workers — forcing developers to choose between Web Workers (XHR available) and Service Workers (fetch interception).
Simon Willison, ‘Claude Opus 4.8’ (May 28, 2026) simonwillison.net
Opus 4.8 is roughly four times less likely than previous versions to overlook flaws in its own code or claim a task is ‘fixed’ when it remains incomplete — a ‘modest but tangible improvement’ for agentic coding work.
holoviz/panel GitHub issue #8416 github.com
Heavy WASM applications using Panel/Bokeh are hitting STATUS_ACCESS_VIOLATION crashes with JSPI enabled; the working mitigation is to disable it via enableRunUntilComplete:false or by deleting WebAssembly.Suspending before loadPyodide.
datagrok.ai release notes — Pyodide 0.29 / pyxhr datagrok.ai
Pyodide 0.29 ships
pyxhr, a requests-like synchronous client built on XMLHttpRequest, used as the fallback when can_run_sync() returns false — urllib3 2.5 now prefers JSPI but transparently degrades to pyxhr.
Łukasz Langa — Pyodide performance notes lukasz.langa.pl
Pyodide remains an ‘interpreter on top of a stack-based VM’; CPU-bound benchmarks run significantly slower than native CPython and up to 100× slower than ahead-of-time WASM compilers like SPy, and the initial payload is still ~10 MB.