What question did this study set out to answer?

The aim is to create a dataset linking free-text findings to precise 3D segmentations in chest CT scans.

June 21, 2026Open Access

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Key Points

The aim is to create a dataset linking free-text findings to precise 3D segmentations in chest CT scans.
Introduced ReXGroundingCT, a dataset with 3142 noncontrast CT scans linked to standardized reports.
Used GPT-4 to extract and standardize findings, followed by categorization into a hierarchical ontology.
Produced 3D annotations validated by board-certified radiologists, with an additional chain-of-thought dataset.
Offers 16,301 annotated entities across 8028 text-to-3D segmentation pairs, covering various findings.
79% of findings are focal abnormalities and 21% are nonfocal.
A public validation set of 50 cases and a private test set of 100 cases are available for evaluation.

Abstract

BACKGROUND Connecting free-text descriptions such as "3-mm nodule in the lower left lobe" to precise 3D segmentations remains an unsolved challenge in medical artificial intelligence.Existing chest computed tomography (CT) datasets rely on structured labels or predefined categories, limiting their ability to represent the richness of clinical language and support grounded radiology report generation.Bridging this gap requires datasets that capture the expressiveness of free-text findings and link them to accurate annotations in volumetric imaging. METHODSWe introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to 3D segmentations in chest CT scans.It includes 3142 noncontrast CT scans paired with standardized radiology reports from CT-RATE, constructed through a three-stage pipeline.First, generative pretrained transformer 4 (GPT-4) extracted and standardized findings, descriptors, and metadata from Turkish reports machine-translated into English.Second, GPT4 omni (GPT-4o) categorized each finding into a hierarchical ontology of lung and pleural abnormalities.Third, 3D annotations were produced for all CT volumes: the training set underwent quality assurance by board-certified radiologists, and the validation and test sets were fully labeled by them.A complementary chain-of-thought dataset was also created, providing step-by-step anatomical reasoning for localizing findings within the CT volume, guided by organ-segmentation models. RESULTSReXGroundingCT provides 16,301 annotated entities across 8028 text-to-3Dsegmentation pairs spanning diverse findings.About 79% of findings are focal abnormalities, while 21% are nonfocal.The dataset includes a public validation set of 50 cases and a private test set of 100 cases, annotated by board-certified radiologists.Model performance on the test set is hosted on a leaderboard at https://rexrank .ai /ReXGroundingCT.CONCLUSIONS ReXGroundingCT is the first manually curated dataset linking free-text chest CT findings to 3D segmentation masks, providing a benchmark for developing and evaluating free-text medical segmentation models.It lays the foundation for the segmentation of free-text findings and generation of grounded radiology reports in CT imaging.

ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

Key Points

Abstract

Cite This Study