Inducing-point-based sparse variational approximation scales Gaussian process models to large datasets but tends to overestimate observation noise and underestimate posterior variance. Parametric predictive Gaussian process regressor (PPGPR) improve on point-wise uncertainty estimations, especially for heteroskedastic data, by repairing an mismatch between the training loss and the predictive metric for sparse variational Gaussian process (SVGP). In this paper, we approach uncertainty estimation with Gaussian process models from the perspective of information theory. We show both SVGP and PPGPR use the information bottleneck (IB) principle but in sub-optimal ways, which make the former fail to fully use the input-dependent latent function variance to model uncertainty, while the latter tend to underestimate observation noise and provide ill-conditioned predictive covariance. We further propose a fix for these problems through a decomposition of mutual information and designing two coupled decoders, resulting in a method named sparse variational information bottleneck Gaussian process (SVIBGP). Experiments on both synthetic and real-world datasets demonstrate that SVIBGP has the ability to account for heteroskedastic noise as well as provide improved uncertainty estimation.
Mao et al. (Thu,) studied this question.