What question did this study set out to answer?

The aim is to develop a system that enhances the mapping of clinical features to standardized OMOP vocabulary.

February 28, 2026Open Access

RAG-Enhanced LLM Pipeline for Semantic Mapping of Context-based Features to OMOP Vocabulary

Key Points

The aim is to develop a system that enhances the mapping of clinical features to standardized OMOP vocabulary.
Develop a retrieval-augmented generation large language model pipeline.
Store OMOP concepts in a vector database.
Retrieve relevant matches based on user input.
Use LLM to generate context-aware concept suggestions.
Improved mapping accuracy compared to standard tools.
Enhanced transparency and usability of the mapping process.
Supports efficient feature extraction in healthcare applications.

Abstract

This work presents a Retrieval-Augmented Generation large language model pipeline that automates the mapping of context-based clinical features to OMOP vocabulary concepts. The system stores OMOP concepts in a vector database, retrieves the most semantically relevant matches based on user input, and uses an LLM to generate context-aware concept suggestions with explanations. The approach improves mapping accuracy compared to standard tools while enhancing transparency and usability. It supports efficient feature extraction and contributes to safer and more effective evaluation of AI applications in healthcare. Original abstract included.

RAG-Enhanced LLM Pipeline for Semantic Mapping of Context-based Features to OMOP Vocabulary

Key Points

Abstract

Cite This Study