
Researchers Surprised that with AI, Toxicity Is Harder to Fake than Intelligence
Companies Mentioned
Why It Matters
The study indicates that current LLMs cannot convincingly replicate the messy, often negative tone of real users, limiting their effectiveness for deceptive social‑media automation and informing platform detection strategies. It also challenges assumptions that larger or instruction‑tuned models are more human‑like, guiding future AI development toward better emotional modeling.
Summary
Researchers from Zurich, Amsterdam, Duke and NYU released a study showing that AI‑generated social‑media replies remain easily detectable, with overly polite or low‑toxicity tone serving as a reliable giveaway. Using a "computational Turing test" across Twitter/X, Bluesky and Reddit, classifiers identified AI‑generated posts with 70‑80% accuracy, and instruction‑tuned or larger models performed worse at mimicking human affect. Simple optimization techniques—like providing examples of a user’s past posts—reduced detectability more than complex fine‑tuning, but emotional nuance and toxicity levels still lagged behind real human replies. The findings suggest a fundamental tension between stylistic human‑likeness and semantic accuracy in current LLMs.
Researchers surprised that with AI, toxicity is harder to fake than intelligence
Comments
Want to join the conversation?
Loading comments...