Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.uci.cu/jspui/handle/123456789/9472
Título : Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach
Autor : Hoyos, Keider
Fernández, Jorge
Martinez, Beatriz
Henao, Óscar
Orozco, Álvaro
Daza, Genaro
Palabras clave : LEARNING ALGORITHMS;DATA PREPROCESSING;DATA VALIDATION
Fecha de publicación : 2018
Editorial : Springer
Citación : Hoyos K., Fernández J., Martinez B., Henao Ó., Orozco Á., Daza G. (2018) Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach. In: Hernández Heredia Y., Milián Núñez V., Ruiz Shulcloper J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science, vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_32
Resumen : The imbalanced data refer to datasets where the number of samples in one class (majority class) is much higher than the other (minority class) causing biased classifiers in favor of the majority class. Currently, it is difficult to develop an effective model using machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. In this paper, we propose a Relevant Information based under-sampling (RIS) approach to improve the classification performance for the minority class by selecting the most relevant samples from the majority class as training data. Our RIS approach is based on a self-organizing principle of relevant information, which allows extracting the underlying structure of the majority class preserving different levels of detail of the original data with a smaller number of samples. Additionally, the RIS captures the data structure beyond second order statistics by estimating information theoretic measures which quantify the statistical structure of the majority class accurately, decreasing the consequences of the imbalanced classes distribution problem. We test our methodology on synthetic and real-world imbalanced datasets. Finally, we use a cross-validation scheme to quantify the classifier performance by evaluating the geometric mean. Results show that our proposal outperforms the state of the art methods for imbalanced class distributions regarding classification geometric mean, especially in highly imbalanced datasets.
URI : https://repositorio.uci.cu/jspui/handle/123456789/9472
Aparece en las colecciones: Eventos

Ficheros en este ítem:
Fichero Tamaño Formato  
A054.pdf118.23 kBAdobe PDFVisualizar/Abrir


Los ítems del Repositorio están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.