SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 152 articles , 1 standard article )

Showing results 61 to 80 of 152.
Sorted by year (citations)
  1. Feng, Shou; Fu, Ping; Zheng, Wenbin: A hierarchical multi-label classification algorithm for gene function prediction (2017)
  2. Gong, Joonho; Kim, Hyunjoong: RHSBoost: improving classification performance in imbalance data (2017)
  3. Hara, Kota; Chellappa, Rama: Growing regression tree forests by classification for continuous object pose estimation (2017)
  4. Koziarski, Michał; Wożniak, Michał: CCR: a combined cleaning and resampling algorithm for imbalanced data classification (2017)
  5. Krautenbacher, Norbert; Theis, Fabian J.; Fuchs, Christiane: Correcting classifiers for sample selection bias in two-phase case-control studies (2017)
  6. Li, Qian; Li, Gang; Niu, Wenjia; Cao, Yanan; Chang, Liang; Tan, Jianlong; Guo, Li: Boosting imbalanced data learning with Wiener process oversampling (2017)
  7. Maldonado, Sebastián; Pérez, Juan; Bravo, Cristián: Cost-based feature selection for support vector machines: an application in credit scoring (2017)
  8. Núñez, Haydemar; Gonzalez-Abril, Luis; Angulo, Cecilio: Improving SVM classification on imbalanced datasets by introducing a new bias (2017)
  9. Roy, Asis; Bhattacharya, Sourangshu; Guin, Kalyan: Prediction of esophageal cancer using demographic, lifestyle, patient history, and basic clinical tests (2017)
  10. Wojciechowski, Szymon; Wilk, Szymon: Difficulty factors and preprocessing in imbalanced data sets: an experimental study on artificial data (2017)
  11. Chen, Yan-Cheng; Su, Chao-Ton: Distance-based margin support vector machine for classification (2016)
  12. Chmielnicki, Wiesław; Stąpor, Katarzyna: Using the one-versus-rest strategy with samples balancing to improve pairwise coupling classification (2016)
  13. Dong, Aimei; Chung, Fu-lai; Wang, Shitong: Semi-supervised classification method through oversampling and common hidden space (2016)
  14. Gámez, Juan Carlos; García, David; González, Antonio; Pérez, Raúl: Ordinal classification based on the sequential covering strategy (2016)
  15. Gong, Chunlin; Gu, Liangxian: A novel SMOTE-based classification approach to online data imbalance problem (2016)
  16. Cheng, Fan; Yang, Kang; Zhang, Lei: A structural SVM based approach for binary classification under class imbalance (2015)
  17. Datta, Shounak; Das, Swagatam: Near-Bayesian support vector machines for imbalanced data classification with equal or unequal misclassification costs (2015)
  18. Fernandez-Lozano, Carlos; Cuiñas, Rubén F.; Seoane, José A.; Fernández-Blanco, Enrique; Dorado, Julian; Munteanu, Cristian R.: Classification of signaling proteins based on molecular star graph descriptors using machine learning models (2015)
  19. Krempl, Georg; Kottke, Daniel; Lemaire, Vincent: Optimised probabilistic active learning (OPAL) (2015)
  20. Lee, J.; Wu, Y.; Kim, H.: Unbalanced data classification using support vector machines with active learning on scleroderma lung disease patterns (2015)