BACKGROUND Connecting free-text descriptions such as "3-mm nodule in the lower left lobe" to precise 3D segmentations remains an unsolved challenge in medical artificial intelligence.Existing chest computed tomography (CT) datasets rely on structured labels or predefined categories, limiting their ability to represent the richness of clinical language and support grounded radiology report generation.Bridging this gap requires datasets that capture the expressiveness of free-text findings and link them to accurate annotations in volumetric imaging. METHODSWe introduce ReXGroundingCT, the first publicly available dataset linking free-text findings to 3D segmentations in chest CT scans.It includes 3142 noncontrast CT scans paired with standardized radiology reports from CT-RATE, constructed through a three-stage pipeline.First, generative pretrained transformer 4 (GPT-4) extracted and standardized findings, descriptors, and metadata from Turkish reports machine-translated into English.Second, GPT4 omni (GPT-4o) categorized each finding into a hierarchical ontology of lung and pleural abnormalities.Third, 3D annotations were produced for all CT volumes: the training set underwent quality assurance by board-certified radiologists, and the validation and test sets were fully labeled by them.A complementary chain-of-thought dataset was also created, providing step-by-step anatomical reasoning for localizing findings within the CT volume, guided by organ-segmentation models. RESULTSReXGroundingCT provides 16,301 annotated entities across 8028 text-to-3Dsegmentation pairs spanning diverse findings.About 79% of findings are focal abnormalities, while 21% are nonfocal.The dataset includes a public validation set of 50 cases and a private test set of 100 cases, annotated by board-certified radiologists.Model performance on the test set is hosted on a leaderboard at https://rexrank .ai /ReXGroundingCT.CONCLUSIONS ReXGroundingCT is the first manually curated dataset linking free-text chest CT findings to 3D segmentation masks, providing a benchmark for developing and evaluating free-text medical segmentation models.It lays the foundation for the segmentation of free-text findings and generation of grounded radiology reports in CT imaging.
Baharoon et al. (Thu,) studied this question.