SMOTEBoost

This code implements SMOTEBoost. SMOTEBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of SMOTE and the standard boosting procedure AdaBoost to better model the minority class by providing the learner not only with the minority class examples that were misclassified in the previous boosting iteration but also with broader representation of those instances (achieved by SMOTE). Since boosting algorithms give equal weight to all misclassified examples and sample from a pool of data that predominantly consists of majority class, subsequent sampling of the training set is still skewed towards the majority class. Thus, to reduce the bias inherent in the learning procedure due to class imbalance and to increase the sampling weights of minority class, SMOTE is introduced at each round of boosting. Introduction of SMOTE increases the number of minority class samples for the learner and focus on these cases in the distribution at each boosting round. In addition to maximizing the margin for the skewed class dataset, this procedure also increases the diversity among the classifiers in the ensemble because at each iteration a different set of synthetic samples are produced. For more detail on the theoretical description of the algorithm please refer to the following paper: N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, ”SMOTEBoost: Improving Prediction of Minority Class in Boosting, Journal of Knowledge Discovery in Databases: PKDD, 2003. The current implementation of SMOTEBoost has been independently done by the author for the purpose of research. In order to enable the users use a lot of different weak learners for boosting, an interface is created with Weka API. Currently,four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic


References in zbMATH (referenced in 18 articles )

Showing results 1 to 18 of 18.
Sorted by year (citations)

  1. Do, Thanh-Nghi; Poulet, François: Parallel multiclass logistic regression for classifying large scale image datasets (2015)
  2. Fernández-Baldera, Antonio; Baumela, Luis: Multi-class boosting with asymmetric binary weak-learners (2014)
  3. Li, Qiujie; Mao, Yaobin: A review of boosting methods for imbalanced data classification (2014)
  4. Di Martino, Matías; Hernández, Guzmán; Fiori, Marcelo; Fernández, Alicia: A new framework for optimal classifier design (2013)
  5. Galar, Mikel; Fernández, Alberto; Barrenechea, Edurne; Herrera, Francisco: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling (2013)
  6. López, Victoria; Fernández, Alberto; García, Salvador; Palade, Vasile; Herrera, Francisco: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics (2013)
  7. Yang, Xingwei; Bai, Xiang; Köknar-Tezel, Suzan; Latecki, Longin Jan: Densifying distance spaces for shape and image retrieval (2013)
  8. Zhang, Yong; Wang, Dapeng: A cost-sensitive ensemble method for class-imbalanced datasets (2013)
  9. He, Jingrui; Tong, Hanghang; Carbonell, Jaime: An effective framework for characterizing rare categories (2012)
  10. Majeske, Karl D.; Lauer, Thomas W.: Optimizing airline passenger prescreening systems with Bayesian decision models (2012)
  11. Yuan, Bo; Liu, Wenhuang: Measure oriented training: a targeted approach to imbalanced classification problems (2012)
  12. Song, Jie; Lu, Xiaoling; Liu, Miao; Wu, Xizhi: Stratified normalization logitboost for two-class unbalanced data classification (2011)
  13. Wang, Benjamin X.; Japkowicz, Nathalie: Boosting support vector machines for imbalanced data sets (2010)
  14. Zhou, Junlin; Lazarevic, Aleksandar; Hsu, Kuo-Wei; Srivastava, Jaideep; Fu, Yan; Wu, Yue: Unsupervised learning based distributed detection of global anomalies (2010)
  15. Mease, David; Wyner, Abraham J.; Buja, Andreas: Boosted classification trees and class probability/quantile estimation (2007)
  16. Sun, Yanmin; Kamel, Mohamed S.; Wong, Andrew K.C.; Wang, Yang: Cost-sensitive boosting for classification of imbalanced data (2007)
  17. Yen, Show-Jane; Lee, Yue-Shi: Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset (2006)
  18. Viktor, Herna L.; Guo, Hongyu: Multiple classifier prediction improvements against imbalanced datasets through added synthetic examples (2004)