Introduction: The analysis of the human microbiome has gained relevance in biomedical research due to its association with various diseases, including cancer. Understanding the human microbiome’s role in health and disease requires robust analytical strategies capable of addressing the complexity and variability of taxonomic abundance data. This study presents a novel machine learning framework for classifying human microbiome profiles, integrating alpha diversity metrics based on Hill numbers, a dissimilarity-driven selection of representative subsets, and a heterogeneous ensemble learning architecture based on stacking. Methods: The method enhances generalization performance by leveraging the probabilistic outputs of multiple base classifiers—Random Forest, K-Nearest Neighbors, Support Vector Machine, Gradient Boosting, and Multi-Layer Perceptron—combined via a logistic regression meta-model. The workflow incorporates internal stratified cross-validation to prevent data leakage and applies a rigorous experimental design comprising 25 independent iterations. Results: The proposed approach outperforms traditional classification baselines, achieving an average sensitivity of 60% and a balanced precision-recall performance, underscoring its utility in clinical settings where early detection is critical. Discussion: This study underscores how methodological choices in diversity representation and ensemble design can critically influence predictive performance and reproducibility. Conclusion: This work demonstrates that incorporating alpha diversity metrics and ensemble methods provides a powerful tool for advancing microbiome-based diagnostics and supports the integration of machine learning into personalized medicine initiatives.
Nazhir et al. (Wed,) studied this question.