What type of study is this?

September 5, 2025

High‐dimensional prediction for count response via sparse exponential weights

Key Points

Prediction error using the proposed method shows strong performance, outperforming Lasso in comparative settings.
The introduction of a novel risk measure for count data provides theoretical guarantees under PAC-Bayesian bounds.
High-dimensional feature selection is crucial for effective prediction, especially when features exceed sample size.
Efficient implementation of the approach leverages Langevin Monte Carlo for robust performance in simulations.

Abstract

Count data is prevalent in various fields such as ecology, medical and genomics research. In high‐dimensional settings, where the number of features exceeds the sample size, feature selection becomes essential. While frequentist methods like Lasso have advanced in handling high‐dimensional count data, Bayesian approaches remain underexplored with no theoretical results on prediction performance. This article introduces a novel probabilistic machine learning framework for high‐dimensional count data prediction. We propose a pseudo‐Bayesian method that integrates a scaled Student prior to promote sparsity and uses an exponential weight aggregation procedure. A key contribution is a novel risk measure tailored to count data prediction, with theoretical guarantees for prediction risk using PAC‐Bayesian bounds. Our results include nonasymptotic oracle inequalities, demonstrating rate‐optimal prediction error without prior knowledge of sparsity. We implement this approach efficiently using Langevin Monte Carlo method. Simulations and a real data application highlight the strong performance of our method compared to the Lasso in various settings.

Ask AI

Helpful

Bookmark