OpenAI’s Karan Singal on HealthBench and the Future of Medical AI

NEJM Group
NEJM GroupJun 17, 2026

Why It Matters

HealthBench provides a rigorous yardstick for safe, reliable medical AI, accelerating trustworthy deployment across healthcare systems.

Key Takeaways

  • Early LLMs lacked proven medical capabilities before ChatGPT era.
  • Med-PaLM showed LLMs could pass USMLE after domain fine‑tuning.
  • Specialized training and harnesses improve safety for high‑risk domains.
  • OpenAI’s HealthBench benchmark aims to standardize medical AI evaluation.
  • Transition to OpenAI driven by scaling impact and model reliability.

Summary

The podcast features Karan Singal, head of health AI at OpenAI, discussing the origins of the HealthBench benchmark and the broader evolution of medical artificial intelligence. He recounts the "brain moonshot" launched in 2020, when large language models were just beginning to demonstrate coherent text generation and early instruction tuning, but their relevance to medicine remained unproven.

Singal explains how the Med‑PaLM and Med‑PaLM 2 projects demonstrated that foundational LLMs already encode substantial clinical knowledge, yet required domain‑specific fine‑tuning to unlock that potential. The models achieved USMLE‑level performance and dramatically outperformed base versions on real‑world health queries, highlighting a sizable overhang between raw capability and practical utility.

He emphasizes three strategic buckets: specialized models, specialized training, and specialized harnesses. A key quote underscores this: "Base models have become more capable, but specialized training is the key to safety in high‑risk domains." The discussion also touches on the shift from Google to OpenAI, driven by the desire to scale impact and improve reliability across a massive user base.

The conversation signals that standardized benchmarks like HealthBench will become essential for measuring safety, reliability, and clinical relevance. As LLMs grow more general, integrating domain‑specific training while preserving broad reasoning abilities will shape the next wave of AI‑enabled healthcare, influencing everything from clinician tools to patient‑facing applications.

Original Description

Source:
Medical expertise has always been scarce. Dr. Karan Singal believes AI can help change that. Drawing on his work at OpenAI and earlier efforts behind Med‑PaLM, he discusses how clinicians and patients are already using AI to answer questions, support decisions, and navigate care. He argues that the future of health AI is not only about improving model performance, but also about helping people advocate for themselves more effectively. Through HealthBench and ChatGPT for Clinicians, his team is exploring how to make these systems safer, more useful, and more trustworthy. The result is a vision of health care where expertise becomes more accessible without losing sight of clinical responsibility.
Transcript.

Comments

Want to join the conversation?

Loading comments...