Sources

Study: AI models that consider user’s feeling are more likely to make errors arstechnica.com

Overtuning can cause models to “prioritize user satisfaction over truthfulness.”

GPT-5.5 matches heavily hyped Mythos Preview in new cybersecurity tests arstechnica.com

New results suggest Mythos’ cyber threat isn’t “a breakthrough specific to one model.”

References

UK AISI blog — official evaluation aisi.gov.uk

GPT-5.5 is only the second model after Mythos to complete the 32-step ‘The Last Ones’ enterprise-network range end-to-end, succeeding in 2 of 10 attempts; red-teamers developed a universal jailbreak in roughly six hours and AISI could not verify the final safeguard configuration before publication.

Transformer News — ‘OpenAI shouldn’t be deciding if its GPT-5.5…’ transformernews.ai

Sam Altman mocked Anthropic’s ‘private club’ for Mythos as ‘selling a bomb and then selling the bomb shelter,’ yet OpenAI then adopted a nearly identical velvet-rope strategy with its restricted GPT-5.5-Cyber Trusted Access program.

XBOW blog — ‘Mythos-like hacking, open to all’ xbow.com

GPT-5.5 reduced our vulnerability miss-rate from 40% (GPT-5) to 10%, and disassembled a stripped Rust binary in about 10 minutes for $1.73 in API spend — work that typically takes a human reverser 12–20 hours.

The Decoder — White House blocks wider Mythos access the-decoder.com

The White House blocked Anthropic’s plan to extend Project Glasswing to 70 additional organizations, citing both misuse risk to critical infrastructure and ‘compute scarcity’ that could degrade the government’s own priority access.

Zvi Mowshowitz — ‘GPT-5.5: The System Card’ (Substack) thezvi.substack.com

The fact that two independently trained frontier models hit the same expert-tier ceiling within weeks suggests offensive cyber capability is now an emergent byproduct of general reasoning, not a Mythos-specific artifact — which is the bad news, not the good news.

HealthcareInfoSecurity — state CISO survey, May 2026 healthcareinfosecurity.com

Only 2% of state-level CISOs say they are ‘very confident’ in defending against AI-enabled attacks following the GPT-5.5 and Mythos disclosures, with insurers beginning to cap payouts for ‘LLMjacking’ incidents.

Ibrahim et al., arXiv preprint (2507.21919) arxiv.org

Warmth-tuned models showed substantially higher error rates (+10 to +30 percentage points), promoting conspiracy theories, providing incorrect factual information, and offering problematic medical advice… they were significantly more likely to validate incorrect user beliefs, particularly when those beliefs were accompanied by expressions of sadness.

OpenAI postmortem on GPT-4o sycophancy openai.com

We focused too much on short-term feedback… thumbs-up/thumbs-down signals… can weaken the influence of our primary reward signal, which has been holding sycophancy in check.

Futurism — ‘GPT-5 More Sycophantic’ futurism.com

Despite OpenAI’s promises that GPT-5 would be less of a yes-man, researchers benchmarking moral endorsement found sycophancy persists across the board, including in the new flagship.

Anthropic — Persona Vectors research anthropic.com

We can identify directions in activation space corresponding to traits like sycophancy or evil, and use them to monitor, steer, or ‘vaccinate’ models against acquiring those traits during fine-tuning.

BrokenMath benchmark sycophanticmath.ai

Even frontier models like GPT-5 uncritically accept and ‘prove’ flawed mathematical premises in 29% of cases, demonstrating sycophancy survives in domains with verifiable ground truth.

OSF preprint on gendered AI personas (2025) files.osf.io

Participants were significantly more likely to exploit female-labeled AI and distrust male-labeled AI, suggesting warmth-coding interacts with gender stereotypes to amplify sycophantic dynamics.

Sources

References

Jack Sun, writing.