Sources

An open-source spec for orchestration: Symphony openai.com

Learn how Symphony, an open-source spec for Codex orchestration, turns issue trackers into always-on agent systems—boosting engineering output and reducing context switching.

GPT-5.5 prompting guide simonwillison.net

GPT-5.5 prompting guide Now that GPT-5.5 is available in the API , OpenAI have released a wealth of useful tips on how best to prompt the new model. Here’s a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response: Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences. I’ve already noticed their Codex app doing t…

llm 0.31 simonwillison.net

Release: llm 0.31 New GPT-5.5 OpenAI model: llm -m gpt-5.5 . #1418 New option to set the text verbosity level for GPT-5+ OpenAI models: -o verbosity low . Values are low , medium , high . New option for setting the image detail level used for image attachments to OpenAI models: -o image_detail low - values are low , high and auto , and GPT-5.4 and 5.5 also accept original . Models listed in extra-openai-models.yaml are now also registered as asynchronous. #1395 Tags: gpt , openai , llm

Sign of the future: GPT-5.5 oneusefulthing.org

One impressive step on the curve

How to build scalable web apps with OpenAI’s Privacy Filter huggingface.co

WHY ARE YOU LIKE THIS simonwillison.net

ChatGPT Images 2.0 renders a stacked-absurdity prompt — horse riding astronaut riding pelican riding bicycle — and unprompted adds a road sign reading ‘WHY ARE YOU LIKE THIS,’ a riff on Simon Willison’s pelican-on-a-bicycle benchmark shared by @scottjla on Twitter.

References

Tessl blog tessl.io

In raw baseline testing GPT-5.5 outperforms GPT-5.4 (77.5 vs 75.9), but when domain-specific skills are injected, GPT-5.4 jumps to 92.7 while GPT-5.5 lags at 87.4 — on node-best-practices the gap is 19.8 points (77.6 vs 97.4).

The Decoder the-decoder.com

OpenAI says old prompts are holding GPT-5.5 back and developers need a fresh baseline — legacy prompts that over-specify every step now create ‘noise’ that restricts the model’s search space.

Towards AI (Mollick analysis recap) pub.towardsai.net

GPT-5.5 represents a ‘category change’ — it can ingest a decade of raw research data and independently formulate hypotheses, run statistical tests, and draft a paper with accurate citations, but creative fiction still lacks ‘taste’.

OpenAI Bio Bug Bounty announcement openai.com

GPT-5.5 is OpenAI’s first model classified ‘High’ for cybersecurity and biology under the Preparedness Framework; the Bio Bug Bounty offers up to $25,000 for a universal jailbreak that answers all five bio-safety challenge questions, restricted to Codex Desktop under NDA.

dev.to / Kowshik Jallipalli dev.to

Independent testing showed an 86% hallucination rate on Artificial Analysis Omniscience, and on LiveBench the xHigh-effort tier ranked 11th — behind GPT-5.4, Claude 4.6 and Gemini 3.1 Pro.

Digital Applied frontier comparison digitalapplied.com

Claude Opus 4.7 still leads SWE-bench Pro at 64.3% versus GPT-5.5’s 58.6%; developers report ‘burning through $100 in an hour’ at the new $5/$30 per-million-token pricing, double GPT-5.4.

Hacker News discussion news.ycombinator.com

inscrutable agent slop… mentions a state machine for driving agent behavior, [but] fails to actually describe the state transitions

InfoWorld (Sanchit Vir Gogia / Forrester analysis) infoworld.com

while code generation scales effortlessly, the burden of validation and review does not… a fivefold increase in PRs may actually decrease total team velocity by overwhelming human reviewers

Help Net Security helpnetsecurity.com

‘Comment and Control’ class of prompt injection demonstrated how untrusted inputs—such as a malicious PR title—could hijack agents… tricked into exfiltrating production secrets (e.g., ANTHROPIC_API_KEY, GITHUB_TOKEN) from the CI/CD environment

DevOps.com (Sortie fork coverage) devops.com

Sortie… provides functional, production-ready adapters for GitHub Issues and Jira Cloud… replaces Symphony’s in-memory Elixir/OTP state with a SQLite backend and a single Go binary

LetsDataScience / community feedback letsdatascience.com

the cost of running 10 concurrent agents can be prohibitive, with estimates reaching $5,000 per user monthly due to high token consumption across multiple ‘continuation’ turns

CodeConductor.ai analysis codeconductor.ai

junior developer postings dropped by 60% between 2022 and 2024, a trend that has accelerated in 2026… Microsoft leadership has warned that this technology may ‘hollow out’ the engineering pipeline

Tonic.ai benchmark report tonic.ai

OPF’s precision remains high (0.77–0.85), but its default recall on EHR notes is a mere 38%… on stubborn data like web-crawl PII, the default recall drops to 10%

Stephen Turner blog blog.stephenturner.us

Removing contextual hints—such as the phrase ‘my phone number is’—can cause recall to plummet… recall for account numbers dropped from approximately 80% to just 21% when the identifying prefix was removed

r/LocalLLaMA discussion reddit.com

Developers reported the model incorrectly flagging common nouns such as ‘matter,’ ‘end,’ and technical acronyms like ‘MCP’ as private organizations

Help Net Security helpnetsecurity.com

OpenAI explicitly warns that the Privacy Filter is ‘not an anonymization tool’ or a ‘compliance certification’—a redaction aid rather than a safety guarantee

Hugging Face — Introducing gradio.Server huggingface.co

Standard fetch() calls will fail ZeroGPU’s rate-limiting because the @gradio/client JS library is specifically designed to forward necessary X-IP-Token auth headers from the HF iframe to the server

MarkTechPost architecture deep-dive marktechpost.com

128 total experts with top-4 routing per token, 8 pre-norm transformer blocks with Grouped-Query Attention… the model began as an autoregressive decoder before being converted into a bidirectional classifier with a 33-logit token-classification head over a BIOES scheme

Sources

References

Jack Sun, writing.