Understanding Gradient Boosting Techniques in Machine Learning
Written on
Chapter 1: Introduction to Gradient Boosting
In this article, we will delve into the ensemble boosting method known as gradient boosting. Previously, we explored random forest, which is based on bagging techniques. In contrast, boosting involves weak learners making predictions on the training dataset, with their errors or residuals being given higher weights for the subsequent learners.
While bagging techniques utilize metrics like Gini and Entropy, boosting focuses on loss functions, as the higher weighted losses are passed on to the next base learner. The final base learner achieves the most accurate prediction with minimized error, resulting in a composite prediction that fits the model well.
In boosting methods, the goal is to reduce bias while predictions are made in a sequential manner. Gradient boosting operates as a sequential model where errors are minimized through gradient descent, resulting in the formation of a base model.
Advantages of Gradient Boosting:
- Reduces bias in predictions.
- Effectively manages missing values.
- Offers flexibility in hyperparameter tuning.
- Incorporates error handling as part of the loss function through gradient descent.
- Capable of addressing both binary and multi-class classification problems.
Loss Functions in Gradient Boosting:
For Regression:
- LAD (Least Absolute Deviation): Focuses on the median of the target value.
- LS (Least Squares): Centers on the mean of the target value.
- Huber: A hybrid approach combining both LAD and LS, where the alpha parameter helps manage sensitivity to outliers.
For Classification:
- Binomial Deviance: Utilized for binary classification, based on the log odds ratio.
- Multinomial Deviance: Used in multi-class classification, where increasing classes can lead to inefficiencies in regression trees, as indicated by the n_class parameter.
- Exponential Loss: Exclusively for binary classification, also applied in Adaboost classification.
Key Considerations:
- The model is trained on sub-samples in stochastic gradient boosting, which merges gradient boosting with bagging averaging (bootstrap).
- The learning rate, managed through shrinkage, serves as a regularization strategy, helping to decrease the number of iterations.
- Classification in gradient boosting mirrors regression, as the outputs from the tree are not class-specific but rather continuous values. For binary classification, the sigmoid function is used, while the softmax function is applied for multi-class scenarios.
- The default criterion metric for classification is 'friedman_mse', which assesses split quality and provides a more accurate approximation compared to MSE and MAE.
Chapter 2: Practical Implementation with the Titanic Dataset
In this chapter, we will illustrate the application of gradient boosting using the Titanic dataset in Python.
This video provides an in-depth intuition for gradient boosting in machine learning, laying the groundwork for understanding how it works.
Importing Libraries:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import GradientBoostingClassifier
We will utilize the MinMaxScaler for value scaling, alongside the boosting classifier. The dataset comprises two files: training and testing CSV files. Using pandas, we can read these files easily.
train_data = pd.read_csv("train.csv")
test_data = pd.read_csv("test.csv")
Pre-processing the Training Data:
We'll begin by selecting the output column, 'Survived', for our target variable and removing it from the feature dataset.
y_train = train_data["Survived"]
train_data.drop(labels="Survived", axis=1, inplace=True)
Next, we will merge the training and test datasets.
full_data = train_data.append(test_data)
Feature Engineering:
We can eliminate columns that are less useful:
drop_columns = ["Name", "Age", "SibSp", "Ticket", "Cabin", "Parch", "Embarked"]
full_data.drop(labels=drop_columns, axis=1, inplace=True)
We will encode the categorical columns using pandas' get_dummies method to convert categorical values into numerical format.
full_data = pd.get_dummies(full_data, columns=["Sex"])
full_data.fillna(value=0.0, inplace=True)
Splitting the Dataset:
After pre-processing, we will split the data into training and testing sets.
X_train = full_data.values[0:891]
X_test = full_data.values[891:]
Scaling is crucial before modeling to standardize the values.
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
state = 12
test_size = 0.30
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
test_size=test_size, random_state=state)
Model Fitting and Evaluation:
We will fit the model and assess its accuracy with various learning rates.
lrate_list = [0.05, 0.075, 0.5, 0.75, 1]
for learning_rate in lrate_list:
gb_clf = GradientBoostingClassifier(n_estimators=20,
learning_rate=learning_rate,
max_features=2,
max_depth=2,
random_state=0)
gb_clf.fit(X_train, y_train)
print("Learning rate: ", learning_rate)
print("Accuracy score (training): {0:.3f}".format(gb_clf.score(X_train, y_train)))
print("Accuracy score (validation): {0:.3f}".format(gb_clf.score(X_val, y_val)))
Conclusion:
From our experiments, we observed that a learning rate of "0.5" yielded satisfactory validation scores. We will predict outcomes using this optimal value.
gb_clf2 = GradientBoostingClassifier(n_estimators=20,
learning_rate=0.5,
max_features=2,
max_depth=2,
random_state=0)
gb_clf2.fit(X_train, y_train)
predictions = gb_clf2.predict(X_val)
print("Confusion Matrix:")
print(confusion_matrix(y_val, predictions))
print("Classification Report")
print(classification_report(y_val, predictions))
The classification results demonstrate the effectiveness of gradient boosting in both classification and regression tasks.
This video explains how gradient boosting works in machine learning, further enhancing your understanding of the technique.
In conclusion, gradient boosting proves to be a powerful tool for achieving robust results in various prediction tasks. I hope you found this article informative. Connect with me on LinkedIn and Twitter for more insights!
Recommended Articles:
- NLP — Zero to Hero with Python
- Python Data Structures: Data-types and Objects
- Python: Zero to Hero with Examples
- Comprehensive Guide to SVM Classification with Python
- In-Depth Look at K-means Clustering with Python
- Detailed Overview of Linear Regression with Python
- Thorough Explanation of Logistic Regression with Python
- Introduction to Time Series Analysis with Python
- NumPy: Comprehensive Guide with Python
- Understanding the Confusion Matrix in Machine Learning