SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 163 articles , 1 standard article )

Showing results 101 to 120 of 163.
Sorted by year (citations)
  1. Li, Gang; Liu, Jinzhen; Li, Xiaoxia; Lin, Ling; Wei, Rong: A multiple biomedical signals synchronous acquisition circuit based on over-sampling and shaped signal for the application of the ubiquitous health care (2014) ioport
  2. Li, Qiujie; Mao, Yaobin: A review of boosting methods for imbalanced data classification (2014)
  3. López, Victoria; Fernández, Alberto; Herrera, Francisco: On the importance of the validation technique for classification with imbalanced datasets: addressing covariate shift when data is skewed (2014) ioport
  4. Maratea, Antonio; Petrosino, Alfredo; Manzo, Mario: Adjusted F-measure and kernel scaling for imbalanced data learning (2014) ioport
  5. Menardi, Giovanna; Torelli, Nicola: Training and assessing classification rules with imbalanced data (2014)
  6. Seiffert, Chris; Khoshgoftaar, Taghi M.; Van Hulse, Jason; Folleco, Andres: An empirical study of the classification performance of learners on imbalanced and noisy software quality data (2014) ioport
  7. Tahir, Muhammad; Khan, Asifullah; Kaya, Hüseyin: Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images (2014)
  8. Zięba, Maciej; Świątek, Jerzy; Lubicz, Marek: Cost sensitive SVM with non-informative examples elimination for imbalanced postoperative risk management problem (2014)
  9. Alvarez-Alvarez, Alberto; Alonso, Jose M.; Trivino, Gracian: Human activity recognition in indoor environments by means of fusing information extracted from intensity of WiFi signal and accelerations (2013) ioport
  10. Cruz-Ramírez, M.; Hervás-Martínez, C.; Gutiérrez, P. A.; Pérez-Ortiz, M.; Briceño, J.; de la Mata, M.: Memetic Pareto differential evolutionary neural network used to solve an unbalanced liver transplantation problem (2013) ioport
  11. López, Victoria; Fernández, Alberto; García, Salvador; Palade, Vasile; Herrera, Francisco: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics (2013) ioport
  12. Pang, Shaoning; Zhu, Lei; Chen, Gang; Sarrafzadeh, Abdolhossein; Ban, Tao; Inoue, Daisuke: Dynamic class imbalance learning for incremental LPSVM (2013)
  13. Qasem, Sultan Noman; Shamsuddin, Siti Mariyam; Hashim, Siti Zaiton Mohd; Darus, Maslina; Al-Shammari, Eiman: Memetic multiobjective particle swarm optimization-based radial basis function network for classification problems (2013) ioport
  14. Stefanowski, Jerzy: Overlapping, rare examples and class decomposition in learning classifiers from imbalanced data (2013) ioport
  15. Wang, Ran; Kwong, Sam; Chen, Degang; Cao, Jingjing: A vector-valued support vector machine model for multiclass problem (2013)
  16. Wu, Jun; Shen, Hong; Li, Yi-Dong; Xiao, Zhi-Bo; Lu, Ming-Yu; Wang, Chun-Li: Learning a hybrid similarity measure for image retrieval (2013)
  17. Yang, Xingwei; Bai, Xiang; Köknar-Tezel, Suzan; Latecki, Longin Jan: Densifying distance spaces for shape and image retrieval (2013)
  18. Yin, Qing-Yan; Zhang, Jiang-She; Zhang, Chun-Xia; Liu, Sheng-Cai: An empirical study on the performance of cost-sensitive boosting algorithms with different levels of class imbalance (2013) ioport
  19. Zhang, Yong; Wang, Dapeng: A cost-sensitive ensemble method for class-imbalanced datasets (2013)
  20. Cieslak, David A.; Hoens, T. Ryan; Chawla, Nitesh V.; Kegelmeyer, W. Philip: Hellinger distance decision trees are robust and skew-insensitive (2012)