Abstract Background and objective Large Language Models (LLMs) show significant potential in healthcare, but their application in Traditional Chinese Medicine (TCM) lacks systematic evaluation. This study aims to comprehensively review LLMs tuning techniques, data construction strategies, evaluation methods, and application scenarios in TCM clinical practice. Methods A scoping review following PRISMA-ScR guidelines was conducted. Researchers systematically searched seven databases for relevant studies published between database inception to May 2025. The analysis focused on identifying model characteristics, tuning techniques, data sources, evaluation methods, application domains and performance limitations to assess the current state and future directions of TCM-oriented LLMs. Results We included 27 studies (21 in English, 6 in Chinese). Application domains comprised TCM knowledge consultation (10 studies) and diagnostic assistance (13 studies), with 4 studies establishing TCM LLMs evaluation benchmarks. LoRA fine-tuning was most widely used (65.2%), often combined with prompt engineering (47.8%), continued pre-training (43.5%), and retrieval-augmented generation (39.1%). Most studies (87.0%) employed multiple technique combinations. Training data balanced theoretical knowledge (classics) with clinical experience (case records), though multimodal data remained severely insufficient. Evaluation methods were multidimensional, with accuracy (63.0%) and human assessment (77.8%) most frequently used. Specialized TCM evaluation benchmarks were gradually established. Current models excel at integrating heterogeneous knowledge, basic syndrome differentiation reasoning, and cross-language knowledge conversion, but show limitations in simulating complex TCM reasoning processes and individualized diagnosis. Conclusion Although TCM-oriented LLMs demonstrate effectiveness in knowledge consultation and diagnostic tasks, they face significant challenges in capturing TCM's holistic paradigm, data quality, and clinical evaluation. Future research should develop TCM-compatible model architectures, build standardized multimodal data ecosystems, strengthen clinical translation, and create evaluation frameworks that reflect TCM's diagnostic process.
Han et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: