Researchers Surprised that with AI, Toxicity Is Harder to Fake than Intelligence

•November 7, 2025

Ars Technica AI•Nov 7, 2025

Companies Mentioned

DeepSeek

X (formerly Twitter)

Bluesky

Why It Matters

The study indicates that current LLMs cannot convincingly replicate the messy, often negative tone of real users, limiting their effectiveness for deceptive social‑media automation and informing platform detection strategies. It also challenges assumptions that larger or instruction‑tuned models are more human‑like, guiding future AI development toward better emotional modeling.

Summary

Researchers from Zurich, Amsterdam, Duke and NYU released a study showing that AI‑generated social‑media replies remain easily detectable, with overly polite or low‑toxicity tone serving as a reliable giveaway. Using a "computational Turing test" across Twitter/X, Bluesky and Reddit, classifiers identified AI‑generated posts with 70‑80% accuracy, and instruction‑tuned or larger models performed worse at mimicking human affect. Simple optimization techniques—like providing examples of a user’s past posts—reduced detectability more than complex fine‑tuning, but emotional nuance and toxicity levels still lagged behind real human replies. The findings suggest a fundamental tension between stylistic human‑likeness and semantic accuracy in current LLMs.

Researchers surprised that with AI, toxicity is harder to fake than intelligence

Read Original Article

Comments

Want to join the conversation?

Loading comments...

Researchers Surprised that with AI, Toxicity Is Harder to Fake than Intelligence

Companies Mentioned

Why It Matters

Summary

Ask Pulse AI:

Comments

AI Pulse