SMOTE
SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Keywords for this software
References in zbMATH (referenced in 152 articles , 1 standard article )
Showing results 41 to 60 of 152.
Sorted by year (- Razzaghi, Talayeh; Safro, Ilya; Ewing, Joseph; Sadrfaridpour, Ehsan; Scott, John D.: Predictive models for bariatric surgery risks with imbalanced medical datasets (2019)
- Xie, Wenhao; Liang, Gongqian; Dong, Zhonghui; Tan, Baoyu; Zhang, Baosheng: An improved oversampling algorithm based on the samples’ selection strategy for classifying imbalanced data (2019)
- Yan, Yuan Ting; Wu, Zeng Bao; Du, Xiu Quan; Chen, Jie; Zhao, Shu; Zhang, Yan Ping: A three-way decision ensemble method for imbalanced data oversampling (2019)
- Zarei, Shaho; Mohammadpour, Adel: Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression (2019)
- Zhang, Xueying; Li, Ruixian; Zhang, Bo; Yang, Yunxiang; Guo, Jing; Ji, Xiang: An instance-based learning recommendation algorithm of imbalance handling methods (2019)
- Zhu, Zonghai; Wang, Zhe; Li, Dongdong; Du, Wenli: Tree-based space partition and merging ensemble learning framework for imbalanced problems (2019)
- Bellinger, Colin; Drummond, Christopher; Japkowicz, Nathalie: Manifold-based synthetic oversampling with manifold conformance estimation (2018)
- Bogaert, Matthias; Ballings, Michel; Van den Poel, Dirk: Evaluating the importance of different communication types in romantic tie prediction on social media (2018)
- Brintrup, A.; Wichmann, P.; Woodall, P.; McFarlane, D.; Nicks, E.; Krechel, W.: Predicting hidden links in supply networks (2018)
- Chandrasekara, N. V.; Tilakaratne, C. D.; Mammadov, M. A.: An ensemble technique for multi class imbalanced problem using probabilistic neural networks (2018)
- Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen: pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC (2018)
- Fast, Shannon M.; Kim, Louis; Cohn, Emily L.; Mekaru, Sumiko R.; Brownstein, John S.; Markuzon, Natasha: Predicting social response to infectious disease outbreaks from Internet-based news streams (2018)
- Maurya, Chandresh Kumar; Toshniwal, Durga: Large-scale distributed sparse class-imbalance learning (2018)
- Roumani, Yazan F.; Roumani, Yaman; Nwankpa, Joseph K.; Tanniru, Mohan: Classifying readmissions to a cardiac intensive care unit (2018)
- Tayal, Aditya; Coleman, Thomas F.; Li, Yuying: Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking (2018)
- Vanhoeyveld, Jellis; Martens, David: Imbalanced classification in sparse and large behaviour datasets (2018)
- Ahmed, Mehreen; Afzal, Hammad; Majeed, Awais; Khan, Behram: A survey of evolution in predictive models and impacting factors in customer churn (2017)
- Athanasiou, Vasileios; Maragoudakis, Manolis: A novel, gradient boosting framework for sentiment analysis in languages where NLP resources are not plentiful: a case study for modern Greek (2017)
- Blagus, Rok; Lusa, Lara: Gradient boosting for high-dimensional prediction of rare events (2017)
- Du, Jie; Vong, Chi-Man; Pun, Chi-Man; Wong, Pak-Kin; Ip, Weng-Fai: Post-boosting of classification boundary for imbalanced data using geometric mean (2017)