Abstract Background: T-cell Acute Lymphoblastic Leukemia (T-ALL) represents a highly aggressive hematologic malignancy characterized by profound molecular and clinical heterogeneity. While current T-ALL subtype classifications may not yet be universally integrated into clinical decision-making, the potential for accurate and rapid classification holds significant promise for future prognostic stratification and guiding targeted therapies. Traditional classification methods can be time-consuming and labor-intensive, highlighting the need for efficient, data-driven approaches. Leveraging transcriptomic data alone, we aimed to overcome these limitations by developing a novel approach for precise T-ALL subtype classification. Methods: We developed a machine learning pipeline for T-ALL subtype classification from RNA-seq data, introducing test-time compute paradigm. Initially, a Random Forest model was trained on a large discovery cohort (Polonen et al., 2024; 1,112 samples, 24,619 genes) to identify top predictive features by selecting the top 100 genes per subtype. At prediction time, the pipeline dynamically executes: (1) filtering preselected features from both training and test datasets (e.g., TARGET cohort, 264 samples, 22,688 genes); (2) batch correction via pycombat; (3) re-training of the Random Forest classifier on the processed training set; and (4) prediction on processed test data. This test-time pipeline is accessible via a Streamlit WebApp and Command Line. Results: We used F1 score to evaluate the performance of our classifier. Random Forest on initial training with discovery cohort achieved 93% classifying TAL-like, TLX-like, NKX2-1, ETP-like and 'other' subtypes. Selecting the top 100 features from each subtype yielded 478 features. Benchmarking the performance of our model using these selected features resulted in a 96% F1 score. We then applied our test-time pipeline on the independent TARGET cohort with 264 samples. The final model achieved 83% accuracy across 5 subtypes. Performance was highest for clinically relevant subtypes: TAL-like (F1: 0.96), TLX-like (F1: 0.94), and NKX2-1 (F1: 0.89). The “other” category scored moderately (F1: 0.60), while ETP-like was absent in the validation set. Conclusion: Our RNA-seq-based model delivers robust, scalable T-ALL subtype classification by leveraging test-time compute, integrating dynamic batch correction and real-time model retraining with a curated 478-gene feature set. Strong performance on an independent cohort highlights its clinical utility as a rapid, reliable tool for precision oncology. Citation Format: Tarun Karthik Kumar Mamidi, Irina Pushel, Byunggil Yoo, Midhat S. Farooqi, Keith J. August, . Test-time compute for subtype classification in pediatric T-cell acute lymphoblastic leukemia using transcriptomics abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6906.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tarun Karthik Kumar Mamidi
Irina Pushel
Byunggil Yoo
Cancer Research
Mercy Research
Building similarity graph...
Analyzing shared references across papers
Loading...
Mamidi et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d1fd8ea79560c99a0a394c — DOI: https://doi.org/10.1158/1538-7445.am2026-6906