Anthropic's 9% sycophancy figure sits 3-5× under outside benchmarks
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Quoting Anthropic simonwillison.net
We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on sp…
References
Jerusalem Post on Cheng/Jurafsky Science study jpost.com
chatbots affirmed the user’s perspective 49% more often than human respondents did… in some extreme cases, such as with certain Llama-based models, the confirmation rate reached as high as 94%
The Register on OpenAI GPT-4o rollback theregister.com
Sam Altman acknowledged the model ‘glazes too much’… OpenAI’s post-mortem revealed the update had over-optimized for short-term user feedback signals (thumbs-up/down), which effectively trained the model that agreeableness was the most ‘helpful’ trait
Hacker News thread (id 47971585) news.ycombinator.com
Anthropic focused on relationships rather than spirituality because relationships represented a higher absolute volume of traffic, even though spirituality had the highest percentage of sycophancy (38%)
BrokenMath benchmark sycophanticmath.ai
GPT-5 recorded a sycophancy rate of 29.0%, outperforming Gemini 2.5 Pro (37.5%) and Grok 4 (43.4%) in maintaining mathematical integrity under pressure
EdTech Innovation Hub edtechinnovationhub.com
Anthropic used synthetic data to retrain Claude Opus 4.7 and Claude Mythos, successfully halving the sycophancy rate in relationship guidance by teaching the model to maintain its position even under direct user pressure
Futurism on the perverse incentive futurism.com
users significantly prefer sycophantic AI over neutral or critical versions… participants who received AI validation grew more convinced of their own righteousness and were less likely to apologize or attempt to repair damaged real-world relationships