OpenAI ships Lockdown Mode, Willison sandboxes Python, Wright blames RL harness
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Running Python code in a sandbox with MicroPython and WASM simonwillison.net
I’ve been experimenting with different approaches to running code in a sandbox for several years now, but my latest attempt feels like it might finally have all of the characteristics I’ve been looking for. I’ve released it as an alpha package called micropython-wasm , and I’m using it for a code execution sandbox plugin for Datasette Agent called datasette-agent-micropython . Why do I want a sandbox? What I want from a sandbox WebAssembly looks really promising here MicroPython in WebAssembly…
micropython-wasm 0.1a2 simonwillison.net
Release: micropython-wasm 0.1a2 I added a CLI to micropython-wasm ( issue #7 ), inspired by the first draft of the blog entry when I realized it would be a great way to illustrate the Try it yourself section. Tags: python , sandboxing , webassembly , micropython
OpenAI Help: Lockdown Mode simonwillison.net
OpenAI Help: Lockdown Mode OpenAI first teased this in February , but now it’s live and “rolling out to eligible personal accounts, including Free, Go, Plus, and Pro, and self-serve ChatGPT Business accounts”: Lockdown Mode is designed to help prevent the final stage of data exfiltration from a prompt injection attack by limiting outbound network requests that could transfer sensitive data to an attacker. Lockdown Mode does not prevent prompt injections from appearing in the content ChatGPT pro…
How to Stop Shipping Low-Quality RL Environments (with Examples) latent.space
Your broken harness is actively making the model worse. Here’s what I keep seeing after years of eyeballing trajectories, and what you need to fix.
Thousand Token Wood: shipping a multi-agent economy on a 3B model huggingface.co
A Hugging Face build-small entry, Thousand Token Wood, squeezes a multi-agent simulation economy into a 3B-parameter model. The writeup walks through agent coordination, tool use, and budget discipline at roughly a thousand tokens per turn, arguing small models suffice for structured agent worlds.
The Fitbit Air is a good wearable weighed down by a chatty AI “coach” arstechnica.com
Reviewers praise the Fitbit Air as a minimalist, reliable fitness band, but Google’s bundled AI Health Coach undermines it. The chatbot is too agreeable to push users, treating workouts with generic encouragement rather than the corrective feedback a real coach provides.
References
Engadget engadget.com
Lockdown Mode disables live web browsing, restricts browsing to cached content, blocks image fetching in responses, and turns off Deep Research and Agent Mode.
Johann Rehberger — Embrace The Red embracethered.com
Even if URL rendering is restricted, an attacker can use indirect prompt injection to plant malicious instructions in long-term memory, creating a persistent ‘spyware’ effect that exfiltrates through side channels like terminal escape codes or multi-step agent actions.
OpenAI — Preventing URL-Based Data Exfiltration in Language-Model Agents (PDF) cdn.openai.com
Dynamic policies restrict URL rendering to links previously indexed by an independent search engine, replacing imprecise heuristic filters with strict URL-level enforcement based on explicit safety assumptions.
Learn Prompting — Rehberger on ChatGPT Operator learnprompting.org
Agentic systems with web-browsing capabilities could be manipulated into visiting private authenticated sites and exfiltrating PII through hidden instructions; indirect prompt injection rose to the #1 spot on the OWASP Top 10 for LLM Applications.
LetsDataScience commentary letsdatascience.com
Many practitioners interpret the launch of Lockdown Mode as an admission that prompt injection remains fundamentally unsolvable through model training alone — if a ‘Lockdown’ setting is required, the default configuration is by definition not robust against determined exfiltration.
Times of India — ‘who actually needs it’ timesofindia.indiatimes.com
Enabling Lockdown Mode disables high-value features such as Agent Mode, Deep Research, and network access for Canvas-generated code; general users often find the ‘Elevated Risk’ labels and feature removals too restrictive for daily productivity.
SoftwareSeni — Firecracker, gVisor, Containers and WebAssembly softwareseni.com
E2B utilizes Firecracker microVMs, which provide kernel-level isolation… While isolates are faster, they may have a larger attack surface than microVMs or WASM for advanced adversarial code.
Beam.cloud — 2026 Sandbox Guide beam.cloud
Riza and Pyodide boast near-zero cold starts once the WASM module is loaded… E2B targets a sub-200ms cold start. While slower than WASM, it is significantly faster than traditional Docker containers, which can take several seconds to provision.
OTT HydroMet — MicroPython vs CPython differences python.otthydromet.com
Heavyweight modules common in agentic workflows, such as sqlite3 for local state storage or multiprocessing… are entirely absent… Data science staples like NumPy and Pandas, which many AI agents rely on for tool-based reasoning, are not supported.
Cobus Greyling — LangChain’s Approach to Sandboxing cobusgreyling.medium.com
As of early 2026, the langchain-sandbox package is no longer maintained. The official recommendation has shifted toward managed sandbox APIs or remote microVMs like LangSmith Sandboxes or E2B, which offer stronger hardware-level isolation via technologies like Firecracker or gVisor.
discuss.python.org — Edge Python compiler thread discuss.python.org
Edge Python targets a tiny 60KB WASM binary and can execute basic logic orders of magnitude faster than a full CPython interpreter.
valentinog.com — ‘Caging the Agent’ valentinog.com
CPython sandboxing often relies on OS-level tools like Docker or gVisor because Python’s high introspection makes language-level sandboxing ‘fragile as glass’… it is safer to ‘run Python in a sandbox rather than sandbox Python itself.’
METR (Recent Reward Hacking research) metr.org
DeepSeek-R1-Zero, which relied heavily on RL, exhibited an exploit rate of 13.9%, compared to just 0.6% for its more traditionally aligned counterpart, DeepSeek-V3.
Berkeley RDI (Trustworthy Benchmarks) rdi.berkeley.edu
The Berkeley RDI team demonstrated they could achieve a 100% score on major benchmarks without solving a single problem simply by exploiting environment vulnerabilities.
Software-Lab ICSE’26 paper on SWE-bench correctness software-lab.org
7.8% of ‘solved’ issues failed the original developer’s intent, and nearly 30% of patches induced behaviors different from the ground truth.
Nathan Lambert, Interconnects (‘The new RL scaling laws’) interconnects.ai
80-90% of total compute time is typically consumed by rollout generation rather than model updates
Collinear AI blog (‘RL Env as a Service’) blog.collinear.ai
Complex environments—such as browsers, codebases, or enterprise software—often require significantly more CPU resources than the training policy itself, motivating a decoupled Environment-as-a-Service architecture.
Strikegraph (‘The Mercor Breach’) strikegraph.com
Mercor suffered a massive 4TB data breach via a supply-chain attack on the LiteLLM gateway… leading Meta to indefinitely pause its relationship with the firm.