SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 152 articles , 1 standard article )

Showing results 121 to 140 of 152.
Sorted by year (citations)

previous 1 2 3 ... 5 6 7 8 next

  1. Ducange, Pietro; Lazzerini, Beatrice; Marcelloni, Francesco: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets (2010) ioport
  2. Fernández, Alberto; Del Jesus, María José; Herrera, Francisco: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets (2010) ioport
  3. He, Jingrui; Carbonell, Jaime: Coselection of features and instances for unsupervised rare category analysis (2010)
  4. Qu, Hai-Ni; Li, Guo-Zheng; Xu, Wei-Sheng: An asymmetric classifier based on partial least squares (2010)
  5. Villar, Pedro; Fernández, Alberto; Herrera, Francisco: A genetic algorithm for feature selection and granularity learning in fuzzy rule-based classification systems for highly imbalanced data-sets (2010)
  6. Wang, Benjamin X.; Japkowicz, Nathalie: Boosting support vector machines for imbalanced data sets (2010) ioport
  7. Wu, Junjie; Xiong, Hui; Chen, Jian: COG: local decomposition for rare class analysis (2010) ioport
  8. Castro, Cristiano Leite; Carvalho, Mateus Araujo; Braga, Antônio Padua: An improved algorithm for SVMs classification of imbalanced data sets (2009)
  9. Fernández, Alberto; José del Jesus, María; Herrera, Francisco: Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets (2009)
  10. Lan, Jyh-shyan; Berardi, Victor L.; Patuwo, B. Eddy; Hu, Michael: A joint investigation of misclassification treatments and imbalanced datasets on neural network performance (2009) ioport
  11. Moskovitch, Robert; Stopel, Dima; Feher, Clint; Nissim, Nir; Japkowicz, Nathalie; Elovici, Yuval: Unknown malcode detection and the imbalance problem (2009) ioport
  12. Orriols-Puig, Albert; Bernadó-Mansilla, Ester: Evolutionary rule-based systems for imbalanced data sets (2009) ioport
  13. Pang, Shaoning; Kasabov, Nikola: Encoding and decoding the knowledge of association rules over SVM classification trees (2009) ioport
  14. Sun, Yi; González Castellano, Cristina; Robinson, Mark; Adams, Rod; Rust, Alistair G.; Davey, Neil: Using pre & post-processing methods to improve binding site predictions (2009)
  15. Sun, Yi; Robinson, Mark; Adams, Rod; te Boekhorst, Rene; Rust, Alistair G.; Davey, Neil: Integrating genomic binding site predictions using real-valued meta classifiers (2009) ioport
  16. Tilakaratne, C. D.; Mammadov, M. A.; Morris, S. A.: Modified neural network algorithms for predicting trading signals of stock market indices (2009)
  17. Chawla, Nitesh V.; Cieslak, David A.; Hall, Lawrence O.; Joshi, Ajay: Automatically countering imbalance and its empirical relationship to cost (2008) ioport
  18. Chen, Mu-Chen; Chen, Long-Sheng; Hsu, Chun-Chin; Zeng, Wei-Rong: An information granulation based data mining approach for classifying imbalanced data (2008) ioport
  19. Fernández, Alberto; García, Salvador; Del Jesus, María José; Herrera, Francisco: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets (2008) ioport
  20. García, V.; Mollineda, R. A.; Sánchez, J. S.: On the (k)-NN performance in a challenging scenario of imbalance and overlapping (2008) ioport

previous 1 2 3 ... 5 6 7 8 next