Clinical Safety of Large Language Models in Oral Cancer–Related Patient Communication: A Longitudinal Study

•March 16, 2026

Research Square – News/Updates•Mar 16, 2026

Companies Mentioned

Google

GOOG

xAI

Why It Matters

The findings show current LLMs can safely convey oral‑cancer information, but variability mandates professional supervision, shaping AI adoption in clinical communication.

Key Takeaways

•Gemini and Grok show comparable scientific accuracy.
•Referral‑safe answers exceed 90% for both models.
•Grok produces longer sentences without readability gain.
•Moderate inter‑model agreement indicates inconsistent outputs.
•Clinician oversight remains essential for AI‑generated advice.

Pulse Analysis

The rapid rise of large language models (LLMs) in consumer health queries has sparked debate over their clinical reliability, especially in high‑stakes domains like oral oncology. Oral cancer remains a leading malignancy worldwide, and patients often turn to AI before seeing a specialist. By evaluating two state‑of‑the‑art models—Google Gemini Pro and xAI Grok‑1—across 280 standardized Turkish scenarios, the study provides a rare, data‑driven glimpse into how well these systems translate complex oncologic information into patient‑friendly language.

Results reveal that both models achieve moderate‑to‑high scientific accuracy, with Gemini marginally outperforming Grok, yet the difference is not statistically significant. Referral safety—whether the response advises professional evaluation—exceeds 90% for both, suggesting an inherent caution that mitigates the risk of false reassurance. Notably, Grok’s longer sentences do not translate into better readability, highlighting that verbosity alone does not improve patient comprehension. The moderate inter‑model agreement underscores that even top‑tier LLMs can diverge on factual content, reinforcing the need for cross‑validation before clinical deployment.

For healthcare providers and policymakers, these insights signal that LLMs can function as informational adjuncts, easing patient anxiety and preparing them for clinical visits. However, the variability in linguistic style and factual consistency mandates robust oversight frameworks, continuous performance monitoring, and clear disclosure to patients. Future research should expand language diversity, incorporate real‑world patient interactions, and explore integration pathways that blend AI efficiency with clinician expertise, ensuring safe, equitable AI‑enhanced care in oral cancer and beyond.