Selecting Algorithms: XGBoost

Picking an algorithm for a machine learning project can be confusing business. Most data scientists will tell you that there is not always a perfect answer to the question “What algorithm or model should I use?” In this series, we look to break down the important details that go into making that decision.

In this article, we dive into XG Boost, an Ensemble method in machine learning that has gotten a lot of attention since its development in 2014. Combining multiple weak learners to form a strong learner, and blending a few different machine learning concepts, XG Boost rapidly rose to the top of the most-used algorithms for machine learning competitions such as those on Kaggle. XG Boost combines elements of these three machine learning ideas.

  1. Ensemble Learning – Ensemble learning is a technique where many models are trained independently of one another. These models might use different subsets of the data so each individual model will be a little different. In the end, those models can be compared, and the results favored by the most models are used for the predictions.
  2. Boosting – Boosting is a type of ensemble learning where instead of independently training these models, the results of each model are used as the input to the next one. By training sequentially, the algorithm can correct each individual model’s errors.
  3. Gradient Boosting – Gradient boosting is a bosting method where errors or a particular loss function, such as Mean Squared Error (MSE) are minimized in each subsequent boosting iteration.

Why might I use XG Boost?

  • Model Performance vs. Alternatives – XGBoost wins a lot of online competitions, showing robust performance across different tasks, data sets, and applications.
  • Classification problems. While XGBoost can perform well on regression problems, it performs particularly well on classification problems and should be a consideration anytime you wish to navigate one.
  • Less feature engineering required. XGBoost can be successful without tactics such as scaling, normalizing data, and handling missing values.
  • Less concern for outliers. With any tree-based model, each split divides the node in two and data is either on one side of the split or the other. Being further away from the split value doesn’t have a dramatic effect, much like the effect of an outlier on a median value calculation.
  • Large datasets, both in number of data points or number of features, are handled well.

Are there drawbacks to XGBoost?

  • Interpretation. The technical details are dense, and describing what is happening under the hood is very difficult to understand by parties without subject matter expertise.
  • Overfitting – While always a concern in machine learning, overfitting is a danger with tree-based algorithms, particularly if the tree depth is large.
  • Tuning – There are many hyperparameters that can be tuned with XG Boost. Developing a feel for tuning them requires some practice and understanding. For a deep dive on the parameters, the XGBoost documentation is the way to go.

Infor Coleman ML offers XG Boost as one of its base algorithms, allowing for a no-code implementation of your machine learning model. One Infor customer in the distribution industry used XGBoost in the first ML model they implemented. It offered results that were obtainable within weeks without the expertise to develop custom models, but with an accuracy that was impressive for a first implementation. This solution provided 20% reduced downtime and cost savings, while also proving its viability as a business tool. Below is an example of an activity using XGBoost in Coleman ML.

We will continue this blog series in the coming weeks to further understand the world of machine learning.

Infor Coleman ML is part of the Infor technology platform. If you would like to learn more about how Coleman can benefit your business and the industry specific machine learning models Infor can deploy don’t hesitate to contact us.