What question did this study set out to answer?

This work aims to evaluate the efficacy of large language models in automating the analysis of medical device regulatory documents.

February 8, 2026Open Access

Scaling medical device regulatory science using large language models

Key Points

This work aims to evaluate the efficacy of large language models in automating the analysis of medical device regulatory documents.
Conduct a validation study involving large language models (LLMs) for data analysis in regulatory science.
Evaluate LLM outputs through expert annotations and LLM-based assessments.
Apply LLMs in three specific areas: monitoring device validation, coding reports, and identifying risk factors.
LLMs achieved accuracy rates of 80% or higher in extracting regulatory attributes.
Successful automation of data analysis demonstrates improved efficiency over manual methods.
LLMs were effective in identifying potential risk factors for post-market adverse events.

Abstract

Abstract Advances in artificial intelligence (AI) and machine learning (ML) have led to a surge in AI/ML-enabled medical devices, posing new challenges for regulators because best practices for developing, testing, and monitoring these devices are still emerging. Consequently, there is a critical need for up-to-date data analyses of the regulatory landscape to inform policy-making. However, such analyses have historically relied upon manual annotation efforts because regulatory documents are unstructured, complex, multi-modal, and filled with jargon. Efforts to automate annotation using simple natural language processing methods have achieved limited success, as they lack the reasoning needed to interpret regulatory materials. Recent progress in large language models (LLMs) presents an unprecedented opportunity to unlock information embedded in regulatory documents. This work conducts the first wide-ranging validation study of LLMs for scaling data analyses in the field of medical device regulatory science. Evaluating LLM outputs using expert manual annotations and “LLM-as-a-judge,” we find that LLMs can accurately extract attributes spanning pre- and post-market settings, with accuracy rates often reaching 80% or higher. We then demonstrate how LLMs can scale up analyses in three applications: (1) monitoring device validation practices, (2) coding medical device reports, and (3) identifying potential risk factors for post-market adverse events.

Bookmark

View Full Paper

Cite This Study

Li et al. (Thu,) studied this question.

synapsesocial.com/papers/698827a20fc35cd7a88467e6 https://doi.org/https://doi.org/10.1038/s41746-026-02353-7

Bookmark

View Full Paper