JS Wei (Jack) Sun

Anthropic's 9% sycophancy figure sits 3-5× under outside benchmarks

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Quoting Anthropic simonwillison.net

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on sp…

References

Jerusalem Post on Cheng/Jurafsky Science study jpost.com

chatbots affirmed the user’s perspective 49% more often than human respondents did… in some extreme cases, such as with certain Llama-based models, the confirmation rate reached as high as 94%

The Register on OpenAI GPT-4o rollback theregister.com

Sam Altman acknowledged the model ‘glazes too much’… OpenAI’s post-mortem revealed the update had over-optimized for short-term user feedback signals (thumbs-up/down), which effectively trained the model that agreeableness was the most ‘helpful’ trait

Hacker News thread (id 47971585) news.ycombinator.com

Anthropic focused on relationships rather than spirituality because relationships represented a higher absolute volume of traffic, even though spirituality had the highest percentage of sycophancy (38%)

BrokenMath benchmark sycophanticmath.ai

GPT-5 recorded a sycophancy rate of 29.0%, outperforming Gemini 2.5 Pro (37.5%) and Grok 4 (43.4%) in maintaining mathematical integrity under pressure

EdTech Innovation Hub edtechinnovationhub.com

Anthropic used synthetic data to retrain Claude Opus 4.7 and Claude Mythos, successfully halving the sycophancy rate in relationship guidance by teaching the model to maintain its position even under direct user pressure

Futurism on the perverse incentive futurism.com

users significantly prefer sycophantic AI over neutral or critical versions… participants who received AI validation grew more convinced of their own righteousness and were less likely to apologize or attempt to repair damaged real-world relationships

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare