Something Wild Happens to ChatGPT’s Responses When You’re Cruel To It

•January 18, 2026

Futurism AI•Jan 18, 2026

Companies Mentioned

OpenAI

Google

GOOG

Google DeepMind

Amazon

AMZN

Apple

AAPL

Why It Matters

The research shows tone can directly influence LLM accuracy, reshaping how businesses craft prompts for reliable AI‑driven decisions.

Key Takeaways

•Rude prompts raised accuracy to 84.8% versus 80.8% polite
•Accuracy dip observed at extreme politeness (75.8%)
•Small wording changes dramatically affect LLM output quality
•Findings challenge prior studies favoring polite prompting
•Researchers advise against hostile interfaces for user experience

Pulse Analysis

The University of Pennsylvania team recently published a pre‑print examining how tone influences ChatGPT‑4o’s problem‑solving performance. By taking 50 baseline questions and rewriting each in five tonal variants—from very polite to very rude—the researchers measured answer correctness across 250 prompts. Their data show a steady rise in accuracy as rudeness increases, peaking at 84.8 % for the very rude version, while the most courteous prompts lag behind at 80.8 % and the ultra‑polite subset falls to 75.8 %. The experiment highlights that even minor lexical shifts can sway large language model outputs. These results run counter to earlier work by RIKEN, Waseda and DeepMind, which reported that impolite language typically degrades performance and that overly courteous phrasing can also diminish returns. One possible explanation lies in the way instruction‑following models have been fine‑tuned on datasets that reward direct, task‑focused language, making blunt commands easier for the model to interpret. Consequently, prompt engineers may need to reconsider the long‑standing advice to embed pleasantries in every query, especially for high‑stakes applications where marginal accuracy gains matter. Beyond raw numbers, the study raises broader questions about the social dynamics of human‑AI interaction. While OpenAI’s CEO has warned that excessive politeness wastes compute cycles, the authors caution against normalizing hostile language, citing risks to accessibility, inclusivity, and user comfort. The findings suggest a hybrid approach: structured APIs for precision tasks and conversational interfaces for casual use, each with tone‑appropriate guidelines. As enterprises integrate LLMs into decision‑making pipelines, understanding the nuanced impact of prompt tone will become a critical component of responsible AI deployment.

AI Pulse

Something Wild Happens to ChatGPT’s Responses When You’re Cruel To It

Companies Mentioned

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI: