Abstract Advances in artificial intelligence (AI) and machine learning (ML) have led to a surge in AI/ML-enabled medical devices, posing new challenges for regulators because best practices for developing, testing, and monitoring these devices are still emerging. Consequently, there is a critical need for up-to-date data analyses of the regulatory landscape to inform policy-making. However, such analyses have historically relied upon manual annotation efforts because regulatory documents are unstructured, complex, multi-modal, and filled with jargon. Efforts to automate annotation using simple natural language processing methods have achieved limited success, as they lack the reasoning needed to interpret regulatory materials. Recent progress in large language models (LLMs) presents an unprecedented opportunity to unlock information embedded in regulatory documents. This work conducts the first wide-ranging validation study of LLMs for scaling data analyses in the field of medical device regulatory science. Evaluating LLM outputs using expert manual annotations and “LLM-as-a-judge,” we find that LLMs can accurately extract attributes spanning pre- and post-market settings, with accuracy rates often reaching 80% or higher. We then demonstrate how LLMs can scale up analyses in three applications: (1) monitoring device validation practices, (2) coding medical device reports, and (3) identifying potential risk factors for post-market adverse events.
Li et al. (Thu,) studied this question.