MWMOTE - Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by generating the synthetic minority class samples to balance the distribution between the samples of the majority and minority classes. This paper identifies that most of the existing oversampling methods may generate the wrong synthetic minority samples in some scenarios and make learning tasks harder. To this end, a new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems. MWMOTE first identifies the hard-to-learn informative minority class samples and assigns them weights according to their euclidean distance from the nearest majority class samples. It then generates the synthetic samples from the weighted informative minority class samples using a clustering approach. This is done in such a way that all the generated samples lie inside some minority class cluster. MWMOTE has been evaluated extensively on four artificial and 20 real-world data sets. The simulation results show that our method is better than or comparable with some other existing methods in terms of various assessment metrics, such as geometric mean (G-mean) and area under the receiver operating curve (ROC), usually known as area under curve (AUC).
Keywords for this software
References in zbMATH (referenced in 8 articles )
Showing results 1 to 8 of 8.
- Bing Zhu; Zihan Gao; Junkai Zhao; Seppe K.L.M. van den Broucke: IRIC: An R library for binary imbalanced classification (2019) not zbMATH
- Kocheturov, Anton; Pardalos, Panos M.; Karakitsiou, Athanasia: Massive datasets and machine learning for computational biomedicine: trends and challenges (2019)
- Yan, Yuan Ting; Wu, Zeng Bao; Du, Xiu Quan; Chen, Jie; Zhao, Shu; Zhang, Yan Ping: A three-way decision ensemble method for imbalanced data oversampling (2019)
- Zhu, Zonghai; Wang, Zhe; Li, Dongdong; Du, Wenli: Tree-based space partition and merging ensemble learning framework for imbalanced problems (2019)
- Vanhoeyveld, Jellis; Martens, David: Imbalanced classification in sparse and large behaviour datasets (2018)
- Koziarski, Michał; Wożniak, Michał: CCR: a combined cleaning and resampling algorithm for imbalanced data classification (2017)
- Daqi, Gao; Ahmed, Dastagir; Lili, Guo; Zejian, Wang; Zhe, Wang: Pseudo-inverse linear discriminants for the improvement of overall classification accuracies (2016)
- Cheng, Fan; Yang, Kang; Zhang, Lei: A structural SVM based approach for binary classification under class imbalance (2015)