Count data is prevalent in various fields such as ecology, medical and genomics research. In high‐dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high‐dimensional count data, Bayesian approaches remain underexplored with no theoretical results on prediction performance. This article introduces a novel probabilistic machine learning framework for high‐dimensional count data prediction. We propose a pseudo‐Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC‐Bayesian bounds. Our results include nonasymptotic oracle inequalities, demonstrating rate‐optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings.
The Tien Mai (Thu,) studied this question.