This work presents a Retrieval-Augmented Generation large language model pipeline that automates the mapping of context-based clinical features to OMOP vocabulary concepts. The system stores OMOP concepts in a vector database, retrieves the most semantically relevant matches based on user input, and uses an LLM to generate context-aware concept suggestions with explanations. The approach improves mapping accuracy compared to standard tools while enhancing transparency and usability. It supports efficient feature extraction and contributes to safer and more effective evaluation of AI applications in healthcare. Original abstract included.
Kakkamani et al. (Mon,) studied this question.