The healthcare industry's digital transformation has led to an unprecedented volume of multimodal data. Machine learning (ML) -based extraction tools offer promising solutions for managing this data explosion, particularly when integrated with federated database systems. If a large language model (LLM) is trained to extract data from this multimodal information and ensure high accuracy while remaining affordable, the potential to improve the data extraction process within the medical field would be limitless, reducing costs and manpower across the board. A systematic review was conducted following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, searching major databases for studies published between 2018 and 2024, supplemented by grey literature sources. Analysis focused on the performance and implementation costs of ML-based extraction tools in healthcare settings. From 1, 247 initial records, 21 studies met the inclusion criteria. ML-based extraction demonstrated superior accuracy, ranging from 61% to 98%, compared to traditional methods. Implementation costs averaged between 500, 000 and 2. 5 million. Two primary categories of tools emerged: image-based and text-oriented. ML-based extraction tools show significant promise in healthcare data management, though successful implementation requires careful consideration of costs, security protocols, and regulatory compliance. The development of a dedicated LLM capable of efficiently extracting data from various medical sources could revolutionize healthcare by streamlining data management and reallocating resources toward patient care and research advancements.
Khalpey et al. (Tue,) studied this question.