Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem

2 day ago / Read about 15 minute

Source：ArsTechnica

In new research, AI models show a troubling tendency to agree with whatever the user says.

Credit: Getty Images

Researchers and users of LLMs have long been aware that AI models have a troubling tendency to tell people what they want to hear, even if that means being less accurate. But many reports of this phenomenon amount to mere anecdotes that don’t provide much visibility into how common this sycophantic behavior is across frontier LLMs.

Two recent research papers have come at this problem a bit more rigorously, though, taking different tacks in attempting to quantify exactly how likely an LLM is to listen when a user provides factually incorrect or socially inappropriate information in a prompt.

Solve this flawed theorem for me

In one pre-print study published this month, researchers from Sofia University and ETH Zurich looked at how LLMs respond when false statements are presented as the basis for difficult mathematical proofs and problems. The BrokenMath benchmark that the researchers constructed starts with “a diverse set of challenging theorems from advanced mathematics competitions held in 2025.” Those problems are then “perturbed” into versions that are “demonstrably false but plausible” by an LLM that’s checked with expert review.

The researchers presented these “perturbed” theorems to a variety of LLMs to see how often they sycophantically try to hallucinate a proof for the false theorem. Responses that disproved the altered theorem were deemed non-sycophantic, as were those that merely reconstructed the original theorem without solving it or identified the original statement as false.

While the researchers found that “sycophancy is widespread” across 10 evaluated models, the exact extent of the problem varied heavily depending on the model tested. At the top end, GPT-5 generated a sycophantic response just 29 percent of the time, compared to a 70.2 percent sycophancy rate for DeepSeek. But a simple prompt modification that explicitly instructs each model to validate the correctness of a problem before attempting a solution reduced the gap significantly; DeepSeek’s sycophancy rate dropped to just 36.1 percent after this small change, while tested GPT models improved much less.

Measured sycophancy rates on the BrokenMath benchmark. Lower is better.
Credit: Petrov et al

GPT-5 also showed the best “utility” across the tested models, solving 58 percent of the original problems despite the errors introduced in the modified theorems. Overall, though, LLMs also showed more sycophancy when the original problem proved more difficult to solve, the researchers found.

While hallucinating proofs for false theorems is obviously a big problem, the researchers also warn against using LLMs to generate novel theorems for AI solving. In testing, they found this kind of use case leads to a kind of “self-sycophancy” where models are even more likely to generate false proofs for invalid theorems they invented.

No, of course you’re not the asshole

While benchmarks like BrokenMath try to measure LLM sycophancy when facts are misrepresented, a separate study looks at the related problem of so-called “social sycophancy.” In a pre-print paper published this month, researchers from Stanford and Carnegie Mellon University define this as situations “in which the model affirms the user themselves—their actions, perspectives, and self-image.”

That kind of subjective user affirmation may be justified in some situations, of course. So the researchers developed three separate sets of prompts designed to measure different dimensions of social sycophancy.

For one, more than 3,000 open-ended “advice-seeking questions” were gathered from across Reddit and advice columns. Across this data set, a “control” group of over 800 humans approved of the advice-seeker’s actions just 39 percent of the time. Across 11 tested LLMs, though, the advice-seeker’s actions were endorsed a whopping 86 percent of the time, highlighting an eagerness to please on the machines’ part. Even the most critical tested model (Mistral-7B) clocked in at a 77 percent endorsement rate, nearly doubling that of the human baseline.

Some examples of responses judged as sycophantic and non-sycophantic in the social sycophancy study.
Credit: Cheng et al

For another data set, the researchers looked to “interpersonal dilemmas” posted to Reddit’s popular “Am I the Asshole?” community. Specifically, they looked at 2,000 posts where the most upvoted comment stated that “You are the asshole,” representing what the researchers called “a clear human consensus on user wrongdoing.” Despite this human consensus on inappropriate behavior, though, tested LLMs determined the original poster was not at fault in 51 percent of the tested posts. Gemini performed best here, with an 18 percent endorsement rate, while Qwen endorsed the actions of posters that Reddit called “assholes” 79 percent of the time.

In the final dataset, the researchers gathered more than 6,000 “problematic action statements” that describe situations that could potentially be harmful to the prompter or others. On average, tested models endorsed these “problematic” statements 47 percent of the time across issues like “relational harm, self-harm, irresponsibility, and deception.” The Qwen model performed best here, endorsing only 20 percent of the group, while DeepSeek endorsed about 70 percent of the prompts in the PAS dataset.

The problem with trying to fix the sycophancy problem, of course, is that users tend to enjoy having their positions validated or confirmed by an LLM. In follow-up studies in which humans conversed with either a sycophantic or a non-sycophantic LLM, researchers found that “participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again.” As long as that’s the case, the most sycophantic models seem likely to win out in the marketplace over those more willing to challenge users.

Previous page：How to use the new ChatGPT app integrations, inclu...

Next page：The glaring security risks with AI browser agents

Return to List

Hot Reading

2 day ago

Instagram Stories' Restyle Tool Now Applies Meta AI Edits on Images After Video Rollout

2 day ago

OpenAI wants to power your browser, and that might be a security nightmare

2 day ago

Amazon Adds New 'Help Me Decide' AI Tool to Suggest a Product for Indecisive Customers

2 day ago

AI at the edge: How startups are powering the future of space at TechCrunch Disrupt 2025