Abstract Background Colorectal cancer (CRC) and chronic kidney disease (CKD) are major contributors to global morbidity and mortality. Increasing epidemiological and genetic evidence suggests a biologically plausible interplay between renal dysfunction and colorectal tumorigenesis. However, publicly available datasets rarely integrate structured renal function parameters with multi-omic cancer data, limiting mechanistic and prognostic investigations of the CRC-CKD axis. methods We systematically evaluated public resources, including the cancer genome atlas (TCGA) - colon adenocarcinoma (COAD) and an open-access transcriptomic survival dataset, to assess the feasibility of integrating clinical and genomic information for biomarker discovery. Transcriptomic data from 62 CRC patients were analyzed using unsupervised clustering, correlation-based feature selection, and multiple supervised machine learning classifiers to identify gene signatures associated with disease-free survival (DFS). Results TCGA-COAD confirmed the canonical mutational landscape of CRC but lacked structured renal function data. In contrast, the survival dataset enabled integrative DFS modelling. Unsupervised analysis identified three transcriptionally distinct subgroups. Random forest and logistic regression achieved the highest predictive performance. Feature importance analysis highlighted CYP2E1, RAB39A , and ZBTB3 as top-ranked predictors of recurrence risk. Conclusions Our findings expose a critical gap in current public repositories regarding integrated CRC-CKD data and demonstrate the feasibility of transcriptomic-driven prognostic modelling. This analysis provides a hypothesisgenerating computational framework to support future multimodal investigations integrating renal, molecular, and imaging parameters in CRC that will be the main objectives of the SIRIO study.
Fusco et al. (Tue,) studied this question.