What question did this study set out to answer?

The study aims to develop ChatCM-RAG, a pipeline for analyzing ChatGPT applications in medical literature.

April 1, 2026Open Access

ChatCM‐RAG: A deep learning‐based natural language processing pipeline for analysing ChatGPT applications in medicine using BERTopic and transformer‐based retrieval‐augmented generation

Key Points

The study aims to develop ChatCM-RAG, a pipeline for analyzing ChatGPT applications in medical literature.
Developed a multi-stage pipeline using BERTopic and transformer models.
Processed 904 peer-reviewed articles from 2022 to 2025.
Evaluated retrieval accuracy and generation quality for medical queries.
Identified four topic clusters in medical AI applications.
Achieved zero fabricated PMIDs and ensured citation authenticity.
Demonstrated high relevance and efficiency with a 0.90 relevance score.

Abstract

Abstract Background The rapid adoption of ChatGPT in healthcare has generated extensive literature. However, systematic analysis of this emerging field using artificial intelligence (AI)‐powered tools remains challenging due to the volume and diversity of publications and the risk of AI‐generated hallucinations that compromise factual accuracy. Objective We developed ChatCM‐RAG, a deep learning pipeline integrating BERTopic with transformer‐based retrieval‐augmented generation to analyse ChatGPT applications in the Medicine literature. Methods We processed 904 peer‐reviewed articles (2022–2025) using a multi‐stage pipeline: BERTopic for topic modelling with UMAP dimensionality reduction and HDBSCAN clustering, Facebook AI Similarity Search for semantic retrieval, and transformer models (T5/GPT) for answer generation. The system was evaluated using representative medical queries across retrieval accuracy, generation quality, and system efficiency metrics. Results ChatCM‐RAG identified four distinct topic clusters: general medical AI applications (46.2%), performance evaluation (13.9%), clinical applications and patient care (12.6%), and chatbot implementations (6.0%), with 21.2% unclustered documents. To reduce hallucination and ensure citation authenticity, the generation module is constrained to a curated corpus of 904 PubMed‐indexed documents and only permits PMID citations that exist in the retrieval set. In our pilot evaluation on eight representative medical queries, we observed 0 fabricated PMIDs and 0 non‐existent citations in generated answers. The system achieved an average response time of 1.73 s, an answer quality score of 0.81, and demonstrated topic‐aware retrieval with a 0.90 relevance score, where 73% of retrieved documents originated from topically appropriate clusters. The ChatCM‐RAG model has been open‐sourced at https://huggingface.co/fc28/ChatCM‐RAG . Additional reproducible analyses were performed using the released dataset ( n = 904) and code to quantify topic interpretability and retrieval robustness without requiring external LLM calls. Using an 80/20 held‐out title‐query evaluation, topic‐filtered lexical retrieval (TF‑IDF within predicted cluster) improved Cluster‐agreement@5 from 0.599 ± 0.309 (global TF‑IDF) to 0.862 ± 0.345. Top TF‐IDF terms were identified for each cluster, and retrieval performance was assessed using a held‐out evaluation. Conclusions ChatCM‐RAG effectively synthesises large‐scale medical literature, revealing that ChatGPT applications in medicine are dominated by exploratory studies with an emerging focus on clinical decision support. The open‐source pipeline provides researchers with powerful tools for understanding AI integration in traditional medicine.

ChatCM‐RAG: A deep learning‐based natural language processing pipeline for analysing ChatGPT applications in medicine using BERTopic and transformer‐based retrieval‐augmented generation

Key Points

Abstract

Cite This Study