Abstract Background: Habitat radiomics characterizes intratumoral heterogeneity by partitioning tumors into biologically meaningful and statistically distinct subregions. Because HPV-positive and HPV-negative oropharyngeal cancers (OPC) differ in morphology, intensity distribution, and texture, we evaluated different machine learning methods and habitat feature aggregation strategies to develop an HPV prediction algorithm. Methods: Radiomic tumor habitats were generated from 192 pretreatment CT scans using a two-level unsupervised clustering approach (K=3). Tumors exhibited one to three habitats of varying prominence. Six aggregation strategies were compared: maximum, largest-volume, sum, mean, minimum, and variance. Each strategy produced a separate feature set and classifier. Support Vector Machine (SVM) models with a radial basis function kernel were trained using 7-fold nested cross-validation. Logistic regression, Gaussian process classification, random forest, naïve Bayes, and extreme gradient boosting (XGBoost) were also evaluated to benchmark model performance. HPV status was defined using p16 immunohistochemistry. SHapley Additive exPlanations (SHAP) analysis was used to interpret model behavior, and Kaplan-Meier curves were generated for the best-performing classifier. Results: With SVM, maximum aggregation achieved the highest balanced accuracy across all strategies (training 0. 906; test 0. 896) and was the overall best-performing approach. Other aggregation strategies (largest-volume, sum, mean, minimum, and variance) demonstrated lower performance (test balanced accuracy range: 0. 292-0. 834). SVM also outperformed all other machine learning algorithms (test balanced accuracy range: 0. 572-0. 796). SHAP identified shape and texture features—including VolumeDensityConvexHull3D, Compactness2₃D, and InformationCorrelation1Merged3D—as major contributors to HPV-positive predictions, whereas HPV-negative tumors exhibited more heterogeneous habitats. Kaplan-Meier analysis showed that predicted HPV groups closely matched ground truth (p=0. 52 HPV-; p=0. 75 HPV+). Conclusions: These results likely imply that the tumor's outcome is not determined by the average properties of its habitats, but rather by the presence and intensity of the most extreme or critical local microenvironmental condition that provides the strongest selective pressure or dictates the dominant biological process. These findings support maximum aggregation as an effective and biologically relevant strategy for HPV prediction in OPC. Citation Format: Oya Altinok, Ghulam Rasool, Asim Waqas, Matthew B. Schabath, Albert Guvenis. Investigating feature selection strategies with habitat-based radiomics for oropharyngeal cancer abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts) ; 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86 (7 Suppl): Abstract nr 2781.
Altinok et al. (Fri,) studied this question.