What question did this study set out to answer?

The aim is to assess machine learning-based data extraction tools and their implementation in healthcare settings.

April 24, 2026Open Access

Machine Learning-Based Data Extraction Tools in Healthcare: A Systematic Review

Key Points

The aim is to assess machine learning-based data extraction tools and their implementation in healthcare settings.
Conducted a systematic review adhering to PRISMA guidelines.
Searched for studies from major databases and grey literature published from 2018 to 2024.
Analyzed the performance and costs associated with ML-based extraction tools.
Twenty-one studies satisfied the inclusion criteria from an initial 1,247 records.
ML-based extraction accuracy varied from 61% to 98%, outperforming traditional methods.
Implementation costs for these tools ranged from $500,000 to $2.5 million.

Abstract

The healthcare industry's digital transformation has led to an unprecedented volume of multimodal data. Machine learning (ML) -based extraction tools offer promising solutions for managing this data explosion, particularly when integrated with federated database systems. If a large language model (LLM) is trained to extract data from this multimodal information and ensure high accuracy while remaining affordable, the potential to improve the data extraction process within the medical field would be limitless, reducing costs and manpower across the board. A systematic review was conducted following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, searching major databases for studies published between 2018 and 2024, supplemented by grey literature sources. Analysis focused on the performance and implementation costs of ML-based extraction tools in healthcare settings. From 1, 247 initial records, 21 studies met the inclusion criteria. ML-based extraction demonstrated superior accuracy, ranging from 61% to 98%, compared to traditional methods. Implementation costs averaged between 500, 000 and 2. 5 million. Two primary categories of tools emerged: image-based and text-oriented. ML-based extraction tools show significant promise in healthcare data management, though successful implementation requires careful consideration of costs, security protocols, and regulatory compliance. The development of a dedicated LLM capable of efficiently extracting data from various medical sources could revolutionize healthcare by streamlining data management and reallocating resources toward patient care and research advancements.

Bookmark

View Full Paper

Bookmark

View Full Paper

Machine Learning-Based Data Extraction Tools in Healthcare: A Systematic Review

Key Points

Abstract

Cite This Study