SMOTEBoost

This code implements SMOTEBoost. SMOTEBoost is an algorithm to handle class imbalance problem in data with discrete class labels. It uses a combination of SMOTE and the standard boosting procedure AdaBoost to better model the minority class by providing the learner not only with the minority class examples that were misclassified in the previous boosting iteration but also with broader representation of those instances (achieved by SMOTE). Since boosting algorithms give equal weight to all misclassified examples and sample from a pool of data that predominantly consists of majority class, subsequent sampling of the training set is still skewed towards the majority class. Thus, to reduce the bias inherent in the learning procedure due to class imbalance and to increase the sampling weights of minority class, SMOTE is introduced at each round of boosting. Introduction of SMOTE increases the number of minority class samples for the learner and focus on these cases in the distribution at each boosting round. In addition to maximizing the margin for the skewed class dataset, this procedure also increases the diversity among the classifiers in the ensemble because at each iteration a different set of synthetic samples are produced. For more detail on the theoretical description of the algorithm please refer to the following paper: N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, ”SMOTEBoost: Improving Prediction of Minority Class in Boosting, Journal of Knowledge Discovery in Databases: PKDD, 2003. The current implementation of SMOTEBoost has been independently done by the author for the purpose of research. In order to enable the users use a lot of different weak learners for boosting, an interface is created with Weka API. Currently,four Weka algortihms could be used as weak learner: J48, SMO, IBk, Logistic


References in zbMATH (referenced in 33 articles )

Showing results 1 to 20 of 33.
Sorted by year (citations)

1 2 next

  1. Chaabane, Ikram; Guermazi, Radhouane; Hammami, Mohamed: Enhancing techniques for learning decision trees from imbalanced data (2020)
  2. Tao, Xinmin; Li, Qing; Guo, Wenjie; Ren, Chao; He, Qing; Liu, Rui; Zou, JunRong: Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering (2020)
  3. Bing Zhu; Zihan Gao; Junkai Zhao; Seppe K.L.M. van den Broucke: IRIC: An R library for binary imbalanced classification (2019) not zbMATH
  4. Wang, Zhe; Cao, Chenjie: Cascade interpolation learning with double subspaces and confidence disturbance for imbalanced problems (2019)
  5. Xie, Wenhao; Liang, Gongqian; Dong, Zhonghui; Tan, Baoyu; Zhang, Baosheng: An improved oversampling algorithm based on the samples’ selection strategy for classifying imbalanced data (2019)
  6. Zhang, Xueying; Li, Ruixian; Zhang, Bo; Yang, Yunxiang; Guo, Jing; Ji, Xiang: An instance-based learning recommendation algorithm of imbalance handling methods (2019)
  7. Zhu, Zonghai; Wang, Zhe; Li, Dongdong; Du, Wenli: Tree-based space partition and merging ensemble learning framework for imbalanced problems (2019)
  8. Vanhoeyveld, Jellis; Martens, David: Imbalanced classification in sparse and large behaviour datasets (2018)
  9. Gong, Joonho; Kim, Hyunjoong: RHSBoost: improving classification performance in imbalance data (2017)
  10. Koziarski, Michał; Wożniak, Michał: CCR: a combined cleaning and resampling algorithm for imbalanced data classification (2017)
  11. Li, Qian; Li, Gang; Niu, Wenjia; Cao, Yanan; Chang, Liang; Tan, Jianlong; Guo, Li: Boosting imbalanced data learning with Wiener process oversampling (2017)
  12. Do, Thanh-Nghi; Poulet, François: Parallel multiclass logistic regression for classifying large scale image datasets (2015) ioport
  13. Fernández-Baldera, Antonio; Baumela, Luis: Multi-class boosting with asymmetric binary weak-learners (2014)
  14. Li, Qiujie; Mao, Yaobin: A review of boosting methods for imbalanced data classification (2014)
  15. Wang, Qiang: A hybrid sampling SVM approach to imbalanced data classification (2014)
  16. Zięba, Maciej; Świątek, Jerzy; Lubicz, Marek: Cost sensitive SVM with non-informative examples elimination for imbalanced postoperative risk management problem (2014)
  17. Di Martino, Matías; Hernández, Guzmán; Fiori, Marcelo; Fernández, Alicia: A new framework for optimal classifier design (2013) ioport
  18. Galar, Mikel; Fernández, Alberto; Barrenechea, Edurne; Herrera, Francisco: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling (2013) ioport
  19. López, Victoria; Fernández, Alberto; García, Salvador; Palade, Vasile; Herrera, Francisco: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics (2013) ioport
  20. Stefanowski, Jerzy: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data (2013) ioport

1 2 next