

AI‑generated citation errors threaten the integrity of scholarly metrics and could erode confidence in peer‑reviewed research, prompting conferences to tighten oversight of LLM usage.
The rise of large language models has transformed how researchers draft papers, offering speed and convenience for tasks like literature reviews and reference formatting. However, these models can fabricate sources that appear plausible, a phenomenon known as citation hallucination. When scholars rely on AI without thorough verification, the scholarly record can become polluted with non‑existent works, undermining the trust that underpins academic discourse.
GPTZero's audit of the 2025 NeurIPS conference provides a concrete data point in this debate. By scanning 4,841 accepted papers, the startup uncovered 100 fabricated citations in 51 manuscripts, equating to roughly 1.1% of papers with at least one false reference. Although the percentage seems modest, each citation functions as a currency of influence; even a single erroneous reference can dilute a researcher’s impact metrics and mislead future work. The findings also expose the limits of current peer‑review pipelines, which are overwhelmed by the sheer submission tsunami, making it difficult for reviewers to catch every AI‑induced slip.
The broader implication is a call to action for conferences, institutions, and authors alike. Implementing AI‑detection tools during manuscript submission, mandating manual verification of generated references, and establishing clear guidelines for LLM usage can mitigate the risk of hallucinated citations. As AI becomes entrenched in the research workflow, proactive policies will be essential to preserve the credibility of scholarly communication and ensure that citation metrics remain a reliable indicator of genuine scientific contribution.
Comments
Want to join the conversation?
Loading comments...