SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 152 articles , 1 standard article )

Showing results 41 to 60 of 152.
Sorted by year (citations)
  1. Razzaghi, Talayeh; Safro, Ilya; Ewing, Joseph; Sadrfaridpour, Ehsan; Scott, John D.: Predictive models for bariatric surgery risks with imbalanced medical datasets (2019)
  2. Xie, Wenhao; Liang, Gongqian; Dong, Zhonghui; Tan, Baoyu; Zhang, Baosheng: An improved oversampling algorithm based on the samples’ selection strategy for classifying imbalanced data (2019)
  3. Yan, Yuan Ting; Wu, Zeng Bao; Du, Xiu Quan; Chen, Jie; Zhao, Shu; Zhang, Yan Ping: A three-way decision ensemble method for imbalanced data oversampling (2019)
  4. Zarei, Shaho; Mohammadpour, Adel: Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression (2019)
  5. Zhang, Xueying; Li, Ruixian; Zhang, Bo; Yang, Yunxiang; Guo, Jing; Ji, Xiang: An instance-based learning recommendation algorithm of imbalance handling methods (2019)
  6. Zhu, Zonghai; Wang, Zhe; Li, Dongdong; Du, Wenli: Tree-based space partition and merging ensemble learning framework for imbalanced problems (2019)
  7. Bellinger, Colin; Drummond, Christopher; Japkowicz, Nathalie: Manifold-based synthetic oversampling with manifold conformance estimation (2018)
  8. Bogaert, Matthias; Ballings, Michel; Van den Poel, Dirk: Evaluating the importance of different communication types in romantic tie prediction on social media (2018)
  9. Brintrup, A.; Wichmann, P.; Woodall, P.; McFarlane, D.; Nicks, E.; Krechel, W.: Predicting hidden links in supply networks (2018)
  10. Chandrasekara, N. V.; Tilakaratne, C. D.; Mammadov, M. A.: An ensemble technique for multi class imbalanced problem using probabilistic neural networks (2018)
  11. Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen: pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC (2018)
  12. Fast, Shannon M.; Kim, Louis; Cohn, Emily L.; Mekaru, Sumiko R.; Brownstein, John S.; Markuzon, Natasha: Predicting social response to infectious disease outbreaks from Internet-based news streams (2018)
  13. Maurya, Chandresh Kumar; Toshniwal, Durga: Large-scale distributed sparse class-imbalance learning (2018)
  14. Roumani, Yazan F.; Roumani, Yaman; Nwankpa, Joseph K.; Tanniru, Mohan: Classifying readmissions to a cardiac intensive care unit (2018)
  15. Tayal, Aditya; Coleman, Thomas F.; Li, Yuying: Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking (2018)
  16. Vanhoeyveld, Jellis; Martens, David: Imbalanced classification in sparse and large behaviour datasets (2018)
  17. Ahmed, Mehreen; Afzal, Hammad; Majeed, Awais; Khan, Behram: A survey of evolution in predictive models and impacting factors in customer churn (2017)
  18. Athanasiou, Vasileios; Maragoudakis, Manolis: A novel, gradient boosting framework for sentiment analysis in languages where NLP resources are not plentiful: a case study for modern Greek (2017)
  19. Blagus, Rok; Lusa, Lara: Gradient boosting for high-dimensional prediction of rare events (2017)
  20. Du, Jie; Vong, Chi-Man; Pun, Chi-Man; Wong, Pak-Kin; Ip, Weng-Fai: Post-boosting of classification boundary for imbalanced data using geometric mean (2017)