What question did this study set out to answer?

The aim is to develop a secure, open-source system for extracting structured clinical data from narrative medical records.

April 5, 2026

Abstract 2738: From chaos to columns: High-accuracy clinical data extraction with CIDER.

Resultado clave

CIDER extracted structured clinical data from Hungarian pathology records with 77.1% to 99.4% accuracy across key variables including sex, T stage, N stage, and primary tumor organ.

Puntos clave

The aim is to develop a secure, open-source system for extracting structured clinical data from narrative medical records.
Developed CIDER, a locally deployed system utilizing an LLM for data extraction.
Integrated vLLM-based inference, predefined data schemas, and prompt-engineered extraction rules.
Evaluated CIDER on 2046 Hungarian-language pathology records comparing automated outputs to manually mined data.
Achieved high data extraction accuracy: sex (99.4%), T stage (95.34%), N stage (92.19%), year of surgery (97.94%), and primary tumor organ (95.52%).
Identified additional clinically relevant information in cases with missing manual annotations for several parameters.
Demonstrated near-expert level accuracy for structured data extraction from complex medical texts.

PICO estructurado

Does the CIDER LLM system accurately extract structured clinical data from unstructured Hungarian-language pathology records compared to manual extraction?

Población

2046 real-world Hungarian-language pathology and histology records

Intervención

CIDER (ClinIcal Data ExtractoR) system using Qwen3-VL-32B-FP8 model

Comparador

Manually mined data

Resultado

Extraction accuracy across six key clinical variables (sex, T stage, N stage, primary tumor organ, year of surgery, and tumor size)

A locally deployed, open-source LLM system can achieve near-expert level accuracy in structured data extraction from complex, non-English medical texts.

Resultado numérico

Tasa de eventos absoluta: 0% vs 0%

Resumen

Abstract The analysis of unstructured medical records represents a crucial challenge in clinical research and healthcare. Large Language Models (LLMs) offer a transformative opportunity to extract structured information from narrative text; however, their use in medical environments is limited by security, ethical, and reproducibility issues. Here we present CIDER (ClinIcal Data ExtractoR), a locally deployed, open-source LLM-based system designed for the secure analysis of medical documentation. CIDER operates through an automated pipeline integrating vLLM-based inference, predefined data schemas, and prompt-engineered extraction rules to convert unstructured clinical text into structured variables. The system processes batch uploads, parsed reports using a fine-tuned model, and generates standardized output tables for direct analytical use. We evaluated CIDER’s ability to extract structured clinical data from real-world Hungarian-language pathology and histology records. Using the Qwen3-VL-32B-FP8 model as the backbone, we analyzed 2046 pathological records and validated the model’s outputs across six key clinical variables: sex, T stage, N stage, primary tumor organ, year of surgery, and tumor size. The extracted data were compared with manually mined data. When manual data were available, extraction accuracy was very high for sex (99.4%, 1971/1982 identical), T stage (95.34%, 879/922), N stage (92.19%, 437/474), year of surgery (97.94%, 1998/2040), and primary tumor organ (95.52%, 1771/1854). The largest tumor size reached an accuracy of 77.05% (1333/1730 identical). Notably, CIDER was also capable of retrieving clinically relevant information in cases where manual annotations were missing, identifying additional instances for sex (n=64), T stage (n=780), N stage (n=213), tumor size (n=291), year of surgery (n=6), and primary tumor organ (n=15). In summary, CIDER demonstrated strong performance across the evaluated parameters. These results show that a locally deployed, open-source LLM system can achieve near-expert level accuracy in structured data extraction from complex, non-English medical texts. By operating entirely within institutional infrastructure, CIDER ensures full data sovereignty and provides a scalable solution for automated medical record interpretation, supporting research, registry development, and clinical decision-making in multilingual healthcare environments. The CIDER platform is publicly accessible at https://llm.gyorffylab.com/cider. Citation Format: Mate Posta, Aida Figler, Zsofia Dobolyi, Balazs Gyorffy, . From chaos to columns: High-accuracy clinical data extraction with CIDER abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 2738.

Me gusta

Guardar

Cite This Study

Posta et al. (Fri,) reported a other. CIDER extracted structured clinical data from Hungarian pathology records with 77.1% to 99.4% accuracy across key variables including sex, T stage, N stage, and primary tumor organ.

synapsesocial.com/papers/69d1fdbfa79560c99a0a3fb0 https://doi.org/https://doi.org/10.1158/1538-7445.am2026-2738

Me gusta

Guardar