Gaussian Processes from Scratch. Gain a deeper understanding of Gaussian… | by Theo Wolf | Jan, 2024

Gain a deeper understanding of Gaussian processes by implementing them with only NumPy.

Theo Wolf
Towards Data Science

Gaussian Processes (GPs) are an incredible class of . There are very few that give you an accurate measure of uncertainty for free while still being super flexible. The problem is, GPs are conceptually really difficult to understand. Most explanations use some algebra and probability, which is often not useful to get an intuition for how these models .

There also are many great guides that skip the maths and give you the intuition for how these models work, but when it comes to using GPs yourself, in the right context, my personal belief is that surface knowledge won’t cut it. This is why I wanted to walk through a bare-bones implementation, from scratch, so that you get a clearer picture of what’s going on under the hood of all the libraries that implement these models for you.

I also link my GitHub repo, where you’ll find the implementation of GPs using only NumPy. I’ve tried to abstract from the maths as much as possible, but obviously there is still some that are required…

The first step is always to have a look at the . We are going to use the monthly CO2 atmospheric concentration over time, measured at the Mauna Loa observatory, a common dataset for GPs [1]. This is intentionally the same dataset that sklearn use in their GP , which teaches how to use their API and not what is going on under the hood of the model.

Monthly CO2 parts per million (ppm) at the Mauna Loa observatory. ( by Author)

This is a very simple dataset, which will make it easier to explain the maths that will follow. The notable are the linear upwards as well as the seasonal trend, with a period of one year.

What we will do is separate the seasonal component and linear components of the data. To do this, we fit a linear model to the data.

Source link