ABSTRACT The Gaussian process (GP) is a widely used method for analyzing large‐scale data sets, including spatio‐temporal measurements of nonlinear processes that are now commonplace in the environmental sciences. Traditional implementations of GPs involve stationary kernels (also termed covariance functions) that limit their flexibility, and exact methods for inference that prevent application to data sets with more than about 10,000 points. Modern approaches to address stationarity assumptions generally fail to accommodate large data sets, while all attempts to address scalability focus on approximating the Gaussian likelihood, which can involve subjectivity and lead to inaccuracies. In this work, we explicitly derive an alternative kernel that can discover and encode both sparsity and nonstationarity. We embed the kernel within a fully Bayesian GP model and leverage high‐performance computing resources to enable the analysis of massive data sets. We demonstrate the favorable performance of our novel kernel relative to existing exact and approximate GP methods across a variety of synthetic data examples. Furthermore, we conduct space–time prediction based on more than 1 million measurements of daily maximum temperature and verify that our results outperform state‐of‐the‐art methods in the Earth sciences. More broadly, having access to exact GPs that use ultra‐scalable, sparsity‐discovering, nonstationary kernels allows GP methods to truly compete with a wide variety of machine learning methods.
Risser et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: