SMOTE

SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.


References in zbMATH (referenced in 149 articles , 1 standard article )

Showing results 1 to 20 of 149.
Sorted by year (citations)

1 2 3 ... 6 7 8 next

  1. Imoussaten, Abdelhak; Jacquin, Lucie: Cautious classification based on belief functions theory and imprecise relabelling (2022)
  2. Johnson, Marina; Albizri, Abdullah; Simsek, Serhat: Artificial intelligence in healthcare operations to enhance treatment outcomes: a framework to predict lung cancer prognosis (2022)
  3. Loynes, Christopher; Ouenniche, Jamal; De Smedt, Johannes: The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing (2022)
  4. Quesnel, Frédéric; Wu, Alice; Desaulniers, Guy; Soumis, François: Deep-learning-based partial pricing in a branch-and-price algorithm for personalized crew rostering (2022)
  5. Akalin, Altuna: Computational genomics with R. With the assistance of Verdan Franke, Bora Uyar and Jonathan Ronen (2021)
  6. Bej, Saptarshi; Davtyan, Narek; Wolfien, Markus; Nassar, Mariam; Wolkenhauer, Olaf: LoRAS: an oversampling approach for imbalanced datasets (2021)
  7. Cao, Yi; Liu, Xiaoquan; Zhai, Jia: Option valuation under no-arbitrage constraints with neural networks (2021)
  8. Chen, Baiyun; Xia, Shuyin; Chen, Zizhong; Wang, Binggui; Wang, Guoyin: RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise (2021)
  9. Du, Yu; Lin, Xiaodong; Pham, Minh; Ruszczyński, Andrzej: Selective linearization for multi-block statistical learning (2021)
  10. Gattermann-Itschert, Theresa; Thonemann, Ulrich W.: How training on multiple time slices improves performance in churn prediction (2021)
  11. Koziarski, Michał; Bellinger, Colin; Woźniak, Michał: RB-CCR: radial-based combined cleaning and resampling algorithm for imbalanced data classification (2021)
  12. Mao, Shanjun; Fan, Xiaodan; Hu, Jie: Correlation for tree-shaped datasets and its Bayesian estimation (2021)
  13. Merdan, Selin; Barnett, Christine L.; Denton, Brian T.; Montie, James E.; Miller, David C.: OR practice-data analytics for optimal detection of metastatic prostate cancer (2021)
  14. Pereira, Rodolfo M.; Costa, Yandre M. G.; Silla, Carlos N. Jr.: Handling imbalance in hierarchical classification problems using local classifiers approaches (2021)
  15. Saito, Miho; Ohsato, Takaya; Yamanaka, Suguru: An empirical evaluation of machine learning performance in corporate sales growth prediction (2021)
  16. Shahee, Shaukat Ali; Ananthakumar, Usha: An overlap sensitive neural network for class imbalanced data (2021)
  17. Soltanzadeh, Paria; Hashemzadeh, Mahdi: RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem (2021)
  18. Steininger, Michael; Kobs, Konstantin; Davidson, Padraig; Krause, Anna; Hotho, Andreas: Density-based weighting for imbalanced regression (2021)
  19. Vargaftik, Shay; Keslassy, Isaac; Orda, Ariel; Ben-Itzhak, Yaniv: RADE: resource-efficient supervised anomaly detection using decision tree-based ensemble methods (2021)
  20. Abdallah, Zahraa S.; Gaber, Mohamed Medhat: Co-eye: a multi-resolution ensemble classifier for symbolically approximated time series (2020)

1 2 3 ... 6 7 8 next