arXiv Analytics

Sign in

arXiv:1801.10273 [stat.ML]AbstractReferencesReviewsResources

Kernel Distillation for Gaussian Processes

Congzheng Song, Yiming Sun

Published 2018-01-31Version 1

Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost at inference time. In this paper, we introduce a new framework, \textit{kernel distillation}, for kernel matrix approximation. The idea adopts from knowledge distillation in deep learning community, where we approximate a fully trained teacher kernel matrix of size $n\times n$ with a student kernel matrix. We combine inducing points method with sparse low-rank approximation in the distillation procedure. The distilled student kernel matrix only cost $\mathcal{O}(m^2)$ storage where $m$ is the number of inducing points and $m \ll n$. We also show that one application of kernel distillation is for fast GP prediction, where we demonstrate empirically that our approximation provide better balance between the prediction time and the predictive performance compared to the alternatives.

Related articles: Most relevant | Search more
arXiv:2506.17366 [stat.ML] (Published 2025-06-20)
Gaussian Processes and Reproducing Kernels: Connections and Equivalences
arXiv:2108.11683 [stat.ML] (Published 2021-08-26)
Estimation of Riemannian distances between covariance operators and Gaussian processes
arXiv:2210.07612 [stat.ML] (Published 2022-10-14)
Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes