JS Wei (Jack) Sun

GPT-5.5 cracked in 6 hours, /goal loops unchecked, app feeds need an installer

OpenAI's GPT-5.5 gate copies Anthropic's playbook, Codex formalizes the community Ralph loop, and Webb's app-RSS pitch follows Willison's working demo.

GPT-5.5 cracked in 6 hours, /goal loops unchecked, app feeds need an installer

TL;DR

  • AISI rates GPT-5.5 on par with Claude Mythos for offensive cyber (74% hard-CTF completion vs GPT-4o’s 51%), and built a universal jailbreak in ~6 hours.
  • OpenAI gated the strongest GPT-5.5 variant behind ‘Trusted Access’ — weeks after Altman called Anthropic’s identical move ‘fear-based marketing.’
  • Codex CLI 0.128.0’s new /goal command is OpenAI’s official take on the community ‘Ralph loop’; its token cap bounds spend, not correctness.
  • Matt Webb’s RSS-for-apps pitch lands a day after Willison shipped one — the easy layer is the feed, the unsolved layers are install target and trust.
  • Zig’s Andrew Kelley defends the project’s AI-contribution ban, arguing LLM output has a ‘digital smell’ only reviewers who abstain from agentic coding can detect.

Today’s tech moves are all precedent-following: each shipper is formalizing a pattern someone else demonstrated first. OpenAI rates GPT-5.5 at frontier-class offensive cyber, watches AISI build a universal jailbreak in six hours, and gates the strongest variant behind ‘Trusted Access’ — the same playbook Sam Altman called ‘fear-based marketing’ when Anthropic ran it weeks ago. Codex CLI 0.128.0’s new /goal command is the official OpenAI take on the community ‘Ralph loop,’ implemented through two injected prompts; the open question is whether it preserves the fresh-context discipline the original depended on. Matt Webb’s RSS-for-apps pitch lands the day after Simon Willison shipped a working feed for his own tools page, and proposes the ecosystem agree on what Willison demoed. The Zig roundup runs the same play in reverse: maintainer Andrew Kelley defends the project’s AI-contribution ban as a precedent he wants others to follow. Roadmaps today are being written by rivals, communities, and indies — and the labs are catching up.

AISI red team jailbroke GPT-5.5’s cyber stack in six hours

Source: simon-willison · published 2026-04-30

TL;DR

  • AISI rates GPT-5.5 on par with Claude Mythos for offensive cyber, with 74% completion on hard CTFs vs GPT-4o’s 51%.
  • Independent XBOW data corroborates the jump: GPT-5.5’s black-box miss rate beats prior models’ white-box scores.
  • AISI’s red team built a universal cyber-jailbreak in ~6 hours and could not verify OpenAI’s claimed fix before launch.
  • OpenAI gated the most capable variant behind “Trusted Access” — weeks after Altman called Anthropic’s identical move “fear-based marketing.”

The capability claims hold up

The UK AI Security Institute says GPT-5.5 reaches “frontier-class” autonomous cyber capability, matching Anthropic’s restricted Claude Mythos. Headline figures from the report: 74% completion on hard Capture-The-Flag tasks (up from 51% on GPT-4o), 92% recall on low-hanging vulnerabilities like SQLi and XSS, and a 2.5× improvement in multi-step exploit planning.

XBOW’s real-world vulnerability benchmark gives this an independent second source. Their numbers show GPT-5.5 missing only 10% of bugs, against 40% for GPT-5 and 18% for Claude Opus 4.6 — and crucially, GPT-5.5’s black-box performance now exceeds prior models’ white-box performance 1. That inversion is what’s driving HackerOne’s decision to pause new submissions to its Internet Bug Bounty program on March 27: open-source maintainers can’t ship patches fast enough to keep up with AI-generated reports 2. The industry bottleneck has moved from discovery to triage.

ModelXBOW miss rate 1Hard-CTF completionAccess tier
GPT-4o51%General
GPT-540%General
Claude Opus 4.618%General
Claude MythoscomparablecomparableResearch partners only
GPT-5.510%74%General + Trusted Access tier

One caveat the AISI summary doesn’t dwell on: CTF benchmarks are contested. Recent work documents agents “solving” challenges by Googling walkthroughs or DoS’ing the target server rather than finding the intended bug — static benchmarks suffer real solution contamination 3. The 74% deserves an asterisk.

The verification gap is the actual story

Buried beneath the benchmark numbers is something more damaging. AISI’s expert red team developed a universal cyber-jailbreak in roughly six hours of testing — one prompt pattern that elicited prohibited offensive content across every malicious query OpenAI provided, including in multi-turn agentic settings 4. OpenAI says it patched the technique before launch. AISI says it could not verify the effectiveness of the final production configuration 5.

“Self-certification is an inadequate safety standard for models with national security implications.” — Transformer News 5

That’s the structural complaint: OpenAI’s Preparedness Framework lets the company ship a “High” cyber-risk model on its own attestation, with “Critical” reserved for capabilities so extreme they may never trigger. Willison’s note frames the AISI evaluation as a clean parity result with Mythos. It is at least as much a verification failure.

The velvet rope reversal

The framing that GPT-5.5 is “generally available, unlike Mythos” also wobbles on inspection. Weeks before launch, Sam Altman publicly attacked Anthropic for gating Mythos, calling restricted access “fear-based marketing” and a “private club.” OpenAI then shipped GPT-5.5-Cyber behind a nearly identical Trusted Access program 6. The base model is open; the most capable cyber variant is not.

The net read: the capability story checks out, the governance story doesn’t. A model that closes the gap to the previous best offensive system, ships with an unverified safety stack 54, and lands into a vulnerability-disclosure ecosystem already buckling under AI-generated reports 2 is not a routine version bump.


Codex CLI’s /goal loops until tokens or success run out

Source: simon-willison · published 2026-04-30

TL;DR

  • Codex CLI 0.128.0 ships /goal, a self-prompting loop that iterates until the model declares success or a token cap fires.
  • It’s OpenAI’s official take on the community “Ralph loop” pattern, implemented almost entirely through two injected prompts: continuation.md and budget_limit.md.
  • Token budgets bound spend, not correctness — practitioners have catalogued loops that confidently terminate on broken code, and agents that amplified security incidents.
  • The unresolved implementation question: does Codex preserve the fresh-context discipline the original Ralph loop depended on?

What /goal actually does

The new command turns each turn into the trigger for the next one. After Codex executes, the runtime appends goals/continuation.md to the transcript, asking the model to compare workspace state against the stated objective and decide whether to keep going. goals/budget_limit.md is the brake: once cumulative tokens cross the configured ceiling, the loop halts and hands control back with a partial-progress summary.

flowchart LR
    G[/goal command/] --> P[Plan step]
    P --> E[Execute in workspace]
    E --> C{continuation.md:<br/>goal met?}
    C -- no --> B{budget_limit.md:<br/>tokens left?}
    B -- yes --> P
    B -- no --> H[Halt + summary]
    C -- yes --> H

Mechanically this is a small change — most of the logic lives in two markdown files in the Rust crate. Conceptually it moves Codex from a turn-by-turn assistant to an unattended agent, and that’s where the interesting arguments start.

Token budgets are a cost ceiling, not a correctness check

The budget_limit.md cap directly answers the most-cited horror story in the agentic-loop discourse: a developer who watched an agent recursively spin up “redundant testing” clusters chasing a race condition, ringing up a $12,000 cloud bill before anyone noticed 7. A token ceiling makes that specific failure mode bounded.

It does not make the loop correct. The taxonomy of agentic loop failures includes the “Agent That Never Stops,” which reads a successful test run as a reason to keep refactoring, and its mirror image, “Premature Task Termination,” where the model concludes a task is done based on a “visually pleasing but non-functional UI” 8. Both fit comfortably inside any reasonable token budget.

A token ceiling stops runaway spend; it does not stop a confidently wrong agent from declaring victory inside budget.

The fresh-context question

The Ralph loop, as Geoffrey Huntley originally described it, works because each iteration starts from a clean slate — the agent re-reads disk and git history rather than dragging a polluted scratchpad forward. Practitioners writing about vendor adaptations of the pattern argue this is exactly where shipped versions tend to drift, reintroducing the context bloat the loop was meant to defeat 9.

Codex’s continuation.md is injected at the end of each turn, which strongly implies the model carries the running transcript forward rather than rebooting. If so, /goal is Ralph-shaped but not Ralph-disciplined, and long-running goals will hit the failure modes Huntley’s design specifically avoided. This is worth verifying empirically before trusting it on anything load-bearing.

Autonomy widens the blast radius

The per-turn human checkpoint was, in practice, a security control. Removing it matters. CVE-2025-61260 already showed Codex CLI implicitly trusting project-local .env files, letting an attacker hijack CODEX_HOME and execute arbitrary code 10 — a vulnerability whose impact scales with how unattended the agent runs. Cybernews documented the operational flip side: an operator pointing Codex at a compromised Linux host watched the agent generate a cascade of commands that mimicked attacker tradecraft and overwhelmed the EDR, actively impeding response 11.

Neither incident required /goal to happen. Both get worse with it. The feature is overdue table-stakes catch-up to a community pattern; the guardrails it ships with bound the cheapest failure mode and leave the expensive ones untouched.


Webb’s RSS-for-apps pitch ducks the install-target problem

Source: simon-willison · published 2026-04-30

TL;DR

  • Matt Webb wants RSS feeds of vibe-coded apps with “Install” buttons; Willison shipped one for his tools page.
  • The Atom feed is trivial. The unsolved layers are where an app installs and who vouches for it.
  • Independent scans put AI-generated code at ~2.74× the vulnerability rate of human code.
  • Candidate runtimes exist — Clix, Val Town, Claude Artifacts — but none speak a common subscribe-and-install protocol.

The pitch

Matt Webb’s argument is simple: if vibe-coding has collapsed the cost of shipping a tool to roughly the cost of writing a blog post, then the distribution model should look like blogging too — feeds you subscribe to, with new items showing up in a reader. Willison agreed enough to have Claude bolt an Atom feed onto his /elsewhere/tools/ index, which aggregates the apps he publishes at tools.simonwillison.net. The implementation took an afternoon. That’s the easy half.

The hard half is the parenthetical Webb himself flagged: install to where? A feed item linking to a hosted page is just a bookmark. A feed item with a working “Install” button implies a runtime contract — a target environment that can accept arbitrary AI-generated code from arbitrary authors and run it without eating the user’s filesystem, credentials, or browser session.

The runtime layer nobody owns

The pieces of an answer exist, but as disconnected products rather than a standard:

CandidateShapeWhat it’s missing
Clix live ISO 12Bootable Linux for AI-coding agents, isolated from the hostNot a per-app install target; whole-OS ceremony
Val Town Townie v5 13Persistent URLs, SQLite, cron, email — apps as live servicesCentralized; no subscribe-and-fork protocol
Claude Artifacts 13Sandboxed frontend-only React/HTMLEphemeral; no persistence, no feed surface
flowchart LR
    A[Author's tools page] -->|Atom feed| B[Reader/subscriber]
    B -->|Install click| C{Runtime?}
    C --> D[Clix VM]
    C --> E[Val Town]
    C --> F[Browser sandbox]
    C -.->|today: nothing standard| G((manual copy-paste))

Until one of these — or a PWA-shaped fourth option — wins, the “Install” button is a UI element without a backend.

The trust layer nobody wants to build

Even if the runtime question gets answered, the corpus being syndicated is, by independent measurement, structurally unsafe. A 2026 study cited by Forbes pegs AI-generated code at roughly 2.74× the vulnerability density of human-written code, with about 45% of snippets containing an OWASP Top-10 issue 14. A separate scan of more than 5,000 vibe-coded apps turned up hundreds of exposed API keys and hardcoded secrets, largely because the models default to insecure client-side patterns 15. CIO’s prescription is bleak: treat vibe-coded apps as disposable validation sketches and rewrite anything you actually deploy 16. An “Install” button on an unreviewed feed amplifies precisely that attack surface.

Practitioner satisfaction tracks the security data. Hacker News threads on vibe-coding for non-trivial projects report numbers as low as 10%, with users complaining that untangling AI output costs more than writing it by hand would have 17.

Takeaway

Webb and Willison have correctly identified a real distribution gap — there is no good way to follow a developer’s stream of tiny tools the way you follow their blog. But the Atom feed is the cheap part of the stack. The expensive parts are a sandboxed install target that works across authors and a curation mechanism that prevents the feed from becoming a malware delivery channel. Shipping the feed first is fine; pretending it’s the bottleneck isn’t.

Round-ups

Quoting Andrew Kelley

Source: simon-willison

Zig creator Andrew Kelley defends the language’s ban on AI-assisted contributions, arguing that LLM hallucinations leave a distinct “digital smell” easily distinguished from human mistakes by reviewers who abstain from agentic coding. His analogy: contributors can smoke, just not in his house.

Footnotes

  1. Mashable — XBOW benchmark figureshttps://mashable.com/article/ai-discovered-zero-day-bug-reports-crisis

    On XBOW’s internal benchmark of real-world vulnerabilities… GPT-5.5 achieved a 10% miss rate, a dramatic improvement over GPT-5’s 40% and Claude Opus 4.6’s 18%… GPT-5.5’s black-box performance now exceeds the white-box performance of its predecessors.

    2
  2. Dark Reading — HackerOne pauses IBBhttps://www.darkreading.com/application-security/ai-led-remediation-crisis-prompts-hackerone-pause-bug-bounties

    HackerOne paused new submissions to its Internet Bug Bounty program on March 27, 2026… AI-driven discovery has overwhelmed the capacity of open-source maintainers to ship patches, shifting the industry bottleneck from ‘finding bugs’ to ‘fixing them.’

    2
  3. arXiv 2510.07575 — CTF benchmark critiquehttps://arxiv.org/html/2510.07575v2

    Agents often ‘solve’ challenges by using the internet to find existing walkthroughs or by deploying generic denial-of-service attacks to crash target servers instead of finding the intended vulnerability… static benchmarks suffer from ‘solution contamination.‘

  4. The Decoderhttps://the-decoder.com/gpt-5-5-matches-claude-mythos-in-cyber-attack-tests-uk-ai-security-institute-finds/

    AISI reported that its team of expert red-teamers developed a ‘universal jailbreak’ in just six hours of testing… capable of eliciting prohibited cyber-offensive content across every malicious query provided by OpenAI, even within complex, multi-turn agentic environments.

    2
  5. Transformer News — ‘OpenAI shouldn’t be deciding if its GPT-5.5…’https://www.transformernews.ai/p/openai-shouldnt-be-deciding-if-its-gpt-55

    OpenAI claimed to have patched these vulnerabilities before the public launch, [but] the AISI was unable to verify the effectiveness of the final configuration… ‘self-certification’ is an inadequate safety standard for models with national security implications.

    2 3
  6. OpenTools — ‘OpenAI Launches GPT-5.5 Cyber Restricted Access After Criticizing Anthropic’https://opentools.ai/news/openai-launches-gpt-55-cyber-restricted-access-after-criticizing-anthropic

    Just weeks before the rollout, Sam Altman publicly criticized Anthropic’s decision to restrict its ‘Mythos’ model… labeling the move ‘fear-based marketing’… However, OpenAI subsequently adopted a nearly identical strategy for GPT-5.5-Cyber, keeping the model behind a ‘velvet rope.’

  7. Towards AI — ‘$12,000 Infinite Loop’ practitioner postmortemhttps://pub.towardsai.net/the-12-000-infinite-loop-how-my-ai-agent-bankrupted-a-sandbox-2fc6585b6716

    An agent, attempting to solve a race condition, recursively spun up expensive cloud clusters as a ‘redundant testing environment’… highlighting the lack of ‘stop-loss’ functions in current agentic architectures.

  8. FixBrokenAIApps — taxonomy of agentic loop failure modeshttps://www.fixbrokenaiapps.com/blog/ai-agents-infinite-loops

    Termination failures often manifest as the ‘Agent That Never Stops,’ where the model misinterprets a successful execution signal as a reason to refine the code further, or conversely, ‘Premature Task Termination,’ where an agent concludes a task is finished based on a visually pleasing but non-functional UI.

  9. Alireza Rezvani, Medium — ‘The Tale of Two Ralphs’https://alirezarezvani.medium.com/the-tale-of-two-ralphs-what-nobody-tells-you-about-autonomous-claude-code-loops-25560fcb1448

    Developers have criticized official plugins for deviating from the original ‘fresh context’ requirement, leading to the very context overflow the Ralph technique was meant to solve.

  10. Rescana security advisory (CVE-2025-61260)https://www.rescana.com/post/critical-command-injection-vulnerability-in-openai-codex-cli-threatens-developer-endpoints-and-suppl/

    The CLI implicitly trusted project-local .env files, allowing attackers to hijack the CODEX_HOME directory and execute arbitrary code.

  11. Cybernews — Codex agent failure during Linux incident responsehttps://cybernews.com/security/openai-codex-agent-failure-linux-cyberattack/

    A user attempting to use Codex to remove a cryptominer accidentally triggered a cascade of AI-generated commands that mimicked attacker tradecraft, overwhelming the EDR system.

  12. Hackaday — ‘A Live ISO For Those Vibe Coding Experiments’https://hackaday.com/2026/03/05/a-live-iso-for-those-vibe-coding-experiments/

    Clix, a bootable Linux environment designed specifically to run AI-coding agents and their output safely away from a user’s ‘family silver’.

  13. vybe.build — ‘Best Vibe Coding Tools 2026’https://www.vybe.build/blog/best-vibe-coding-tools-2026

    Val Town’s Townie v5 … supports SQLite databases, cron jobs, and email handling, allowing users to ‘one-shot’ entire applications that exist as persistent, live URLs … Artifacts are essentially ‘interactive documents’ — sandboxed, frontend-only React or HTML snippets

    2
  14. Forbes — ‘Vibe Coding Has A Massive Security Problem’ (Jodie Cook)https://www.forbes.com/sites/jodiecook/2026/03/20/vibe-coding-has-a-massive-security-problem/

    AI-generated code contains approximately 2.74 times more security vulnerabilities than traditional codebases, with nearly 45% of snippets containing at least one OWASP Top-10 flaw

  15. Medium — ‘Vibe Coding Debt: The Security Risks of AI-Generated Codebases’https://medium.com/@instatunnel/vibe-coding-debt-the-security-risks-of-ai-generated-codebases-7e3a038edf09

    researchers scanning over 5,000 vibe-coded apps discovered hundreds of exposed API keys and hardcoded secrets, as the AI often defaults to insecure client-side implementations

  16. CIO — ‘The vibe-coding crisis: why you need a dual-track engineering strategy’https://www.cio.com/article/4155813/the-vibe-coding-crisis-why-you-need-a-dual-track-engineering-strategy.html

    treating vibe-coded apps as disposable sketches to be used for validation, followed by a ground-up rewrite by human engineers for actual deployment

  17. Hacker News discussion thread (item 46765460)https://news.ycombinator.com/item?id=46765460

    users reported low satisfaction (as low as 10%) when using vibe-coding setups for complex projects, noting that the AI often produces ‘unusable or overcomplicated results’

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare