What does this research mean for the field?

Generative artificial intelligence (GAI) can improve medical writing efficiency but poses significant methodological, ethical, and legal risks that require careful management. Novelty: ClaimNovelty.SYNTHESIS. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This review aims to identify the limitations and risks of using large language models in medical writing and propose responsible integration methods.

March 7, 2026Open Access

Limitations and mitigation strategies for using generative artificial intelligence in medical writing: a narrative review

Key Points

This review aims to identify the limitations and risks of using large language models in medical writing and propose responsible integration methods.
Conducted a narrative review of current literature on LLM-assisted medical writing
Examined the effectiveness and risks associated with LLM use at various manuscript stages
Proposed principles for mitigating risks related to ethics, accuracy, and data privacy
LLMs improve readability and reduce writing time, especially for non-native English authors
Identified risks include factual inaccuracies, citation errors, and data privacy concerns
Recommended a framework combining human oversight to ensure integrity in medical writing

Abstract

Purpose: Large language models (LLMs) improve medical writing efficiency but introduce methodological, ethical, and legal risks. This review examines current evidence on the limitations of LLM-assisted medical writing and proposes principles for its responsible integration into biomedical research.Current concepts: LLMs are commonly used for draft generation, language editing, literature summarization, reference handling, statistical code generation, and manuscript structuring. Studies consistently report improved readability and reduced writing time, particularly among non-native English-speaking authors. However, recurrent challenges include factual hallucinations; fabricated or inaccurate citations; incomplete retrieval of recent literature due to training cutoffs; prompt-sensitive statistical errors; ambiguity regarding authorship and accountability; risks of unintended plagiarism; and concerns related to patient data privacy. These limitations arise from the probabilistic nature of LLMs and their lack of intrinsic fact-verification mechanisms or ethical reasoning.Discussion and conclusion: Risks associated with LLM use vary by manuscript stage and therefore require differentiated oversight. LLMs should be confined primarily to language refinement rather than fact generation, and literature-related outputs must be verified against primary sources, preferably using retrieval-augmented tools. Statistical analyses should remain under human control, with independent validation of all outputs. Ethical governance requires transparent disclosure of LLM use, clear assignment of human responsibility, and strict safeguards for sensitive data. A dual framework combining human-in-the-loop and human-on-the-loop oversight offers a pragmatic model for balancing efficiency with scientific rigor. When positioned as augmentative tools rather than autonomous agents, LLMs can be responsibly integrated into medical research without compromising integrity or reproducibility.

Bookmark

View Full Paper

Bookmark

View Full Paper

Limitations and mitigation strategies for using generative artificial intelligence in medical writing: a narrative review

Key Points

Abstract

Cite This Study