Testicular Germ Cell Tumors (TGCTs) are the most common form of cancer in young men and are classified into several histological subtypes. This thesis applies various machine learning methods to seek insight on potential microRNA molecules that could serve as biomarkers for this disease or its subtypes. Publicly available miRNA profiles from The Cancer Genome Atlas (TCGA) TGCT study were used together with labels containing the subtype composition for each tumor. Control samples, when used, were taken from the GTEx study. A classification task between TGCT and non-TGCT tissues was conducted, as well as several binary classification tasks to distinguish between TGCT subtypes. The main challenges faced in these tasks include low sample size, high-dimensionality and the complex feature selection task. Other obstacles include batch effect and multi-collinearity between features. To estimate the stability of the chosen features, bootstrapping was used. The results of this thesis lead to some of the same conclusions as in the original TCGA-TGCT study. Furthermore, microRNAs hsa-miR-199b-5p, hsa-miR-199a and hsa-miR-99b-5p are proposed as potential markers for teratoma and several other molecules are proposed as potentially useful in the task of detecting TGCTs or differentiating between its subtypes. However, the aforementioned obstacles make further investigation of the involved biological pathways and future studies necessary for more confident and biologically meaningful conclusions.
Ignas Silickas (Wed,) studied this question.