Gradient Boosting from Theory to Practice (Part 2) | by Dr. Roi Yehoshua | Jul, 2023

Use the gradient boosting classes in to solve different and regression problems

Dr. Roi Yehoshua
Towards Data Science
by Luca Bravo on Unsplash

In the first part of this article, we presented the gradient boosting and showed its in pseudocode.

In this part of the article, we will explore the classes in Scikit-Learn that implement this algorithm, discuss their various parameters, and demonstrate how to use them to solve several classification and regression problems.

Although the XGBoost library (which will be covered in a future article) provides a more optimized and highly implementation of gradient boosting, for to medium-sized sets it is often easier to use the gradient boosting classes in Scikit-Learn, which have a simpler interface and a significantly fewer number of hyperparameters to tune.

Scikit-Learn provides the following classes that implement the gradient-boosted decision trees (GBDT) :

  1. GradientBoostingClassifier is used for classification problems.
  2. GradientBoostingRegressor is used for regression problems.

In addition to the standard parameters of decision trees, such as criterion, max_depth (set by default to 3) and min_samples_split, these classes provide the following parameters:

  1. loss — the loss to be optimized. In GradientBoostingClassifier, this function can be ‘log_loss’ (the default) or ‘exponential’ (which will make gradient boosting behave like the AdaBoost algorithm). In GradientBoostingRegressor, this function can be ‘squared_loss’ (the default), ‘absolute_loss’, ‘huber’, or ‘quantile’.
  2. n_estimators — the number of boosting iterations (defaults to 100).
  3. learning_rate — a factor that shrinks the contribution of each tree (defaults to 0.1).
  4. subsample — the fraction of samples to use for training each tree (defaults to 1.0).
  5. max_features — the number of features to consider when searching for the best split in each node. The options are to specify an integer for the…

Source link