SMOTE
SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Keywords for this software
References in zbMATH (referenced in 152 articles , 1 standard article )
Showing results 121 to 140 of 152.
Sorted by year (- Ducange, Pietro; Lazzerini, Beatrice; Marcelloni, Francesco: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets (2010) ioport
- Fernández, Alberto; Del Jesus, María José; Herrera, Francisco: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets (2010) ioport
- He, Jingrui; Carbonell, Jaime: Coselection of features and instances for unsupervised rare category analysis (2010)
- Qu, Hai-Ni; Li, Guo-Zheng; Xu, Wei-Sheng: An asymmetric classifier based on partial least squares (2010)
- Villar, Pedro; Fernández, Alberto; Herrera, Francisco: A genetic algorithm for feature selection and granularity learning in fuzzy rule-based classification systems for highly imbalanced data-sets (2010)
- Wang, Benjamin X.; Japkowicz, Nathalie: Boosting support vector machines for imbalanced data sets (2010) ioport
- Wu, Junjie; Xiong, Hui; Chen, Jian: COG: local decomposition for rare class analysis (2010) ioport
- Castro, Cristiano Leite; Carvalho, Mateus Araujo; Braga, Antônio Padua: An improved algorithm for SVMs classification of imbalanced data sets (2009)
- Fernández, Alberto; José del Jesus, María; Herrera, Francisco: Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets (2009)
- Lan, Jyh-shyan; Berardi, Victor L.; Patuwo, B. Eddy; Hu, Michael: A joint investigation of misclassification treatments and imbalanced datasets on neural network performance (2009) ioport
- Moskovitch, Robert; Stopel, Dima; Feher, Clint; Nissim, Nir; Japkowicz, Nathalie; Elovici, Yuval: Unknown malcode detection and the imbalance problem (2009) ioport
- Orriols-Puig, Albert; Bernadó-Mansilla, Ester: Evolutionary rule-based systems for imbalanced data sets (2009) ioport
- Pang, Shaoning; Kasabov, Nikola: Encoding and decoding the knowledge of association rules over SVM classification trees (2009) ioport
- Sun, Yi; González Castellano, Cristina; Robinson, Mark; Adams, Rod; Rust, Alistair G.; Davey, Neil: Using pre & post-processing methods to improve binding site predictions (2009)
- Sun, Yi; Robinson, Mark; Adams, Rod; te Boekhorst, Rene; Rust, Alistair G.; Davey, Neil: Integrating genomic binding site predictions using real-valued meta classifiers (2009) ioport
- Tilakaratne, C. D.; Mammadov, M. A.; Morris, S. A.: Modified neural network algorithms for predicting trading signals of stock market indices (2009)
- Chawla, Nitesh V.; Cieslak, David A.; Hall, Lawrence O.; Joshi, Ajay: Automatically countering imbalance and its empirical relationship to cost (2008) ioport
- Chen, Mu-Chen; Chen, Long-Sheng; Hsu, Chun-Chin; Zeng, Wei-Rong: An information granulation based data mining approach for classifying imbalanced data (2008) ioport
- Fernández, Alberto; García, Salvador; Del Jesus, María José; Herrera, Francisco: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets (2008) ioport
- García, V.; Mollineda, R. A.; Sánchez, J. S.: On the (k)-NN performance in a challenging scenario of imbalance and overlapping (2008) ioport