
Google Deepmind's "AI Co-Clinician" Beats GPT-5.4 in Blind Doctor Tests but Still Trails Experienced Physicians
Companies Mentioned
Why It Matters
The system proves AI can meaningfully augment clinical decision‑making, but the gap to seasoned physicians highlights the need for careful integration in patient care.
Key Takeaways
- •AI co‑clinician preferred in 67 of 98 primary‑care queries
- •Scored 73.3% on RxQA benchmark, edging out GPT‑5.4
- •Achieved 95% quality on open‑ended medication questions
- •Matched or beat physicians in 68 of 140 evaluated areas
- •Dual‑agent design uses Planner to enforce safe clinical limits
Pulse Analysis
Artificial intelligence is moving from research labs into the exam room, and Google DeepMind’s latest effort – an “AI co‑clinician” – exemplifies that shift. Built around a “triadic care” model, the system is designed to operate alongside physicians, offering evidence‑based recommendations while keeping clinicians in the decision loop. The approach addresses longstanding concerns about AI autonomy in medicine by pairing a Planner module that monitors dialogue with a Talker that delivers patient‑facing advice. This architecture aims to combine the speed of large language models with the safety safeguards required for clinical use.
In head‑to‑head, blind tests the AI co‑clinician outperformed existing clinical AI tools and narrowly beat OpenAI’s GPT‑5.4‑thinking‑with‑search on 98 realistic primary‑care queries, winning 67‑26 and 63‑30 respectively. On the RxQA benchmark, which evaluates drug‑interaction knowledge across 600 pharmacist‑validated questions, the DeepMind system achieved a 73.3% accuracy rate, edging GPT‑5.4’s 72.7%, and reached a 95% quality score on open‑ended medication queries. Multimodal trials further demonstrated the model’s ability to guide inhaler technique and conduct virtual shoulder exams, showcasing capabilities beyond text‑only assistants.
The results underscore that AI remains a support tool rather than a replacement for seasoned clinicians. While the co‑clinician matched or exceeded physicians in 68 of 140 assessed dimensions, experienced doctors still led in critical tasks such as spotting red‑flag symptoms and conducting physical examinations. The dual‑agent safety layer and real‑time citation checks address regulatory and liability concerns, but the single critical error recorded in the study highlights the need for continued validation. As health systems explore AI‑augmented telemedicine, DeepMind’s prototype offers a promising, yet cautious, glimpse of the future of collaborative care.
Google Deepmind's "AI co-clinician" beats GPT-5.4 in blind doctor tests but still trails experienced physicians
Comments
Want to join the conversation?
Loading comments...