SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 141 articles , 1 standard article )

Showing results 81 to 100 of 141.
Sorted by year (citations)
  1. López, Victoria; Fernández, Alberto; Herrera, Francisco: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed (2014) ioport
  2. Maratea, Antonio; Petrosino, Alfredo; Manzo, Mario: Adjusted F-measure and kernel scaling for imbalanced data learning (2014) ioport
  3. Menardi, Giovanna; Torelli, Nicola: Training and assessing classification rules with imbalanced data (2014)
  4. Seiffert, Chris; Khoshgoftaar, Taghi M.; Van Hulse, Jason; Folleco, Andres: An empirical study of the classification performance of learners on imbalanced and noisy software quality data (2014) ioport
  5. Tahir, Muhammad; Khan, Asifullah; Kaya, Hüseyin: Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images (2014)
  6. Zięba, Maciej; Świątek, Jerzy; Lubicz, Marek: Cost sensitive SVM with non-informative examples elimination for imbalanced postoperative risk management problem (2014)
  7. Alvarez-Alvarez, Alberto; Alonso, Jose M.; Trivino, Gracian: Human activity recognition in indoor environments by means of fusing information extracted from intensity of WiFi signal and accelerations (2013) ioport
  8. Cruz-Ramírez, M.; Hervás-Martínez, C.; Gutiérrez, P. A.; Pérez-Ortiz, M.; Briceño, J.; de la Mata, M.: Memetic Pareto differential evolutionary neural network used to solve an unbalanced liver transplantation problem (2013) ioport
  9. López, Victoria; Fernández, Alberto; García, Salvador; Palade, Vasile; Herrera, Francisco: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics (2013) ioport
  10. Pang, Shaoning; Zhu, Lei; Chen, Gang; Sarrafzadeh, Abdolhossein; Ban, Tao; Inoue, Daisuke: Dynamic class imbalance learning for incremental LPSVM (2013)
  11. Qasem, Sultan Noman; Shamsuddin, Siti Mariyam; Hashim, Siti Zaiton Mohd; Darus, Maslina; Al-Shammari, Eiman: Memetic multiobjective particle swarm optimization-based radial basis function network for classification problems (2013) ioport
  12. Stefanowski, Jerzy: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data (2013) ioport
  13. Wang, Ran; Kwong, Sam; Chen, Degang; Cao, Jingjing: A vector-valued support vector machine model for multiclass problem (2013)
  14. Wu, Jun; Shen, Hong; Li, Yi-Dong; Xiao, Zhi-Bo; Lu, Ming-Yu; Wang, Chun-Li: Learning a hybrid similarity measure for image retrieval (2013)
  15. Yang, Xingwei; Bai, Xiang; Köknar-Tezel, Suzan; Latecki, Longin Jan: Densifying distance spaces for shape and image retrieval (2013)
  16. Yin, Qing-Yan; Zhang, Jiang-She; Zhang, Chun-Xia; Liu, Sheng-Cai: An empirical study on the performance of cost-sensitive boosting algorithms with different levels of class imbalance (2013) ioport
  17. Zhang, Yong; Wang, Dapeng: A cost-sensitive ensemble method for class-imbalanced datasets (2013)
  18. Cieslak, David A.; Hoens, T. Ryan; Chawla, Nitesh V.; Kegelmeyer, W. Philip: Hellinger distance decision trees are robust and skew-insensitive (2012)
  19. Majeske, Karl D.; Lauer, Thomas W.: Optimizing airline passenger prescreening systems with Bayesian decision models (2012)
  20. Wang, Dianhui; Do, Hai Thanh: Computational localization of transcription factor binding sites using extreme learning machines (2012) ioport