XGBoost Training Speed: A Comparative Analysis
masudahiroto
Posted on February 17, 2024
Introduction
In this experiment, we aimed to assess the training speed of XGBoost, a popular gradient boosting library, under various conditions. The experiment was conducted on Google Colab, and different settings were explored to understand the impact on training time.
Experimental Setup
Machine Environment
The experiments were conducted on Google Colab.
XGBoost Versions and Configurations
The selection of XGBoost versions and configurations for this experiment was based on prevalent discussions and articles found online. Drawing inspiration from widely available information, the following four configurations were chosen:
- A. XGBoost 1.7.6 with default settings
- B. XGBoost 1.7.6 with
tree_method=hist
- C. XGBoost 2.0.3 with default settings
- D. XGBoost 2.0.3 with GPU acceleration
Training Task Assumptions
The training task involved:
- Dataset: 1 million records, 200 features (including 10 categorical features)
- Training for 100 epochs (30 epochs for configuration A due to extended training time)
Results
- A. XGBoost 1.7.6 (Default): 43.80682 seconds per iteration
- B. XGBoost 1.7.6 (
tree_method=hist
): 0.95269 seconds per iteration - C. XGBoost 2.0.3 (Default): 0.95920 seconds per iteration
- D. XGBoost 2.0.3 (GPU): 0.08077 seconds per iteration
Discussion
- Configuration B, utilizing histogram-based tree method, demonstrated a significant speed improvement compared to the default settings in version 1.7.6.
- The speed of XGBoost 1.7.6 with
tree_method=hist
and 2.0.3 was observed to be almost the same. - GPU acceleration (Configuration D) showcased a remarkable speedup.
Code
The complete code for this experiment has been uploaded to my GitHub Gist. You can access it here.
Conclusion
Utilizing advanced features like histogram-based tree methods and GPU acceleration can substantially enhance the training speed, contributing to more efficient model development processes. Practitioners should consider these factors based on their specific requirements and computational resources when employing XGBoost for machine learning tasks.
Posted on February 17, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.