Microbiome sequencing datasets are sparse, high-dimensional, compositional, and hierarchically structured. Predictive modelling from these data typically relies on ad hoc choices of feature representation, obscuring their impact on performance and biological interpretation. A standardized, compute-efficient framework is needed to jointly optimize microbial feature representation and model algorithms with transparent model evaluation. Here, we present ritme, an opensource software package implementing Combined Algorithm Selection and Hyperparameter Optimization tailored to microbial sequencing data. ritme systematically explores feature engineering methods — taxonomic aggregation, sparsity-aware selection, compositional transforms, and metadata enrichment — alongside diverse model classes using state-of-the-art optimizers and model trackers. Applied to three real-world use cases, ritme outperforms original study pipelines and generic AutoML baselines. It further provides users with insights into how feature and model choices drive predictive performance. Together, these results establish ritme as a standardized framework for identifying optimal feature-model combinations from high-throughput sequencing data. ritme is an open-source Python package available at https://github.com/adamovanja/ritme.
Building similarity graph...
Analyzing shared references across papers
Adamov et al. (Tue,) studied this question.
Loading...
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: