Before diving in
For a long time, I recall having this vague impression about Gaussian Processes (GPs) being able to magically define probability distributions over sets of functions, yet I procrastinated reading up about them for many many moons. However, as always, I’d like to think that this is not just due to my procrastination superpowers. Whenever I look up “Gaussian Process” on Google, I find these well-written tutorials with vivid plots that explain everything up until non-linear regression in detail, but shy away at the very first glimpse of any sort of information theory. The key takeaway is always,
A Gaussian process is a probability distribution over possible functions that fit a set of points.