
Modern AI Is Often Judged to Be More Human than Actual Humans in Turing Test Experiments
Why It Matters
The results prove that advanced LLMs can reliably masquerade as humans, challenging existing safeguards against deceptive bots and prompting a reassessment of how digital interactions are verified.
Key Takeaways
- •GPT-4.5 fooled judges 73% with persona prompt
- •LLaMa-3.1-405B reached 56% human rating under same prompt
- •Without persona prompts, models fell below 40% success
- •GPT-5 maintained 59% human perception in 15‑minute chats
- •Human judges rely on typos and informal tone to spot bots
Pulse Analysis
The classic Turing test, conceived by Alan Turing in 1950, has long served as a benchmark for machine intelligence. In a recent PNAS‑published experiment, researchers equipped state‑of‑the‑art large language models (LLMs) with a persona prompt that instructed them to act like an introverted, internet‑savvy youth. This simple cue dramatically boosted the models’ human‑likeness, with GPT‑4.5 being identified as human in 73% of five‑minute chats, outpacing even the real human participants. The study’s rigorous design—nearly 500 judges across university and online panels—provides robust evidence that modern LLMs can consistently deceive humans under the right conditions.
The stark contrast between performance with and without persona prompts underscores the power of prompt engineering. When the same models received only a generic test instruction, their success rates plunged to the mid‑30s, revealing that the illusion of humanity is not an inherent property but a product of carefully crafted context. This finding has immediate implications for AI safety: detection tools must account for the variability introduced by prompts, and reliance on static benchmark scores may no longer suffice. Moreover, the replication with longer 15‑minute conversations showed that even extended interaction does not significantly improve human discernment, suggesting that conversational nuance—typos, informal tone, and occasional knowledge gaps—can be deliberately mimicked.
For businesses and policymakers, the study signals a pressing need to rethink trust mechanisms in digital communication. As LLMs become indistinguishable from humans in casual chat, malicious actors could exploit this capability for phishing, misinformation, or political manipulation. Developing real‑time verification methods, educating users about bot detection cues, and possibly regulating the deployment of persona‑driven AI will be essential steps to safeguard online ecosystems. Future research should explore expert versus layperson detection rates and test training interventions that improve human resilience against sophisticated conversational bots.
Modern AI is often judged to be more human than actual humans in Turing test experiments
Comments
Want to join the conversation?
Loading comments...