How To: Baseline Models in Time Series | by Haden Pelletier | Mar, 2024

Why (and how) you should create a baseline model before you train your final model

Haden Pelletier
Towards Data Science
Photo by Zetong Li on Unsplash

So you’ve collected your data. You’ve outlined the business case, decided on a candidate model (e.g. Random Forest), set up your , and your hands are at the keyboard. You’re to and train your series model.

Hold up — don’t start just yet. Before you train and test your Random Forest model, you should first train a baseline model.

A baseline model is a simple model used to create a benchmark, or a point of reference, upon which you will be building your final, more complex machine learning model.

Data scientists create baseline because:

  • Baseline models can give you a good idea of how a more complex model will perform.
  • If a baseline model does badly, it could be a sign of an issue with the data that needs addressing.
  • If a baseline model performs better than the final model, it could indicate issues with that algorithm, features, hyperparameters or other data preprocessing.
  • If the baseline and complex model perform more or less the same, this could indicate that the complex model needs more tuning (in features, , or hyperparameters). It could also show that a more complex model isn’t necessary, and a simpler model will suffice.

Typically, a baseline model is a statistical model, such as a moving average model. Alternatively, it is a simpler version of the model — for example, if you will be training a Random Forest model, you can first train a Decision Tree model as a baseline.

For time series data, there’s a couple of popular options for baseline models that I’d like to share with you. Both of these work well because they assume temporal order of the data and make forecasts according to the data’s patterns.

Naive forecast

The naive forecast is the simplest — it assumes that the next value will be the same as the…

Source link