A Reporting Checklist for Large Language Models in Behavioural Science
Why It Matters
Transparent reporting will bolster rigor and reproducibility of LLM‑driven behavioural research, enabling journals and policymakers to assess methodological and ethical quality.
Key Takeaways
- •GUIDE‑LLM checklist contains 14 core reporting items
- •Delphi study achieved >66% consensus from 80 experts
- •Requires precise model version, access mode, and parameter disclosure
- •Mandates full prompt text, system prompts, and persona details
- •Supports reproducibility by sharing code, scripts, and interaction logs
Pulse Analysis
The rapid adoption of large language models (LLMs) such as GPT‑4 and Llama in behavioural and social science research has opened new avenues for simulating human judgment, scaling data annotation, and delivering AI‑driven interventions. Yet the same flexibility that fuels innovation also introduces variability—different model versions, temperature settings, and prompt phrasing can produce divergent outcomes, undermining replication and internal validity. As funding agencies and journals demand higher standards of transparency, the field faces a pressing need for a unified reporting framework that captures these technical nuances while addressing ethical concerns around bias and data privacy.
In response, a multinational team of scholars introduced GUIDE‑LLM, a consensus‑based checklist derived from a two‑round Delphi study involving 80 experts spanning psychology, economics, political science, and machine‑learning. The 14‑item checklist mandates disclosure of model identifiers (including timestamped versions), access mode (API versus web interface), and all generation parameters such as temperature and token limits. It also requires researchers to publish the exact prompts—including system messages, personas, and chain‑of‑thought instructions—alongside validation procedures like human verification of output quality. By obligating the sharing of code, scripts, and interaction logs, GUIDE‑LLM creates a reproducibility pipeline that can survive commercial model updates and proprietary constraints.
Adoption of GUIDE‑LLM promises to reshape the research ecosystem. Journals that embed the checklist into submission guidelines will filter out studies lacking essential methodological detail, while funding bodies can assess the robustness of AI‑enabled proposals more reliably. Moreover, the checklist’s living‑document design ensures it will incorporate emerging model capabilities, multimodal inputs, and evolving ethical standards. As the behavioural sciences increasingly intersect with AI, GUIDE‑LLM offers a pragmatic tool to safeguard scientific integrity and foster responsible innovation.
A reporting checklist for large language models in behavioural science
Comments
Want to join the conversation?
Loading comments...