Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach

Hoyos, Keider; Fernández, Jorge; Martinez, Beatriz; Henao, Óscar; Orozco, Álvaro; Daza, Genaro

Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.uci.cu/jspui/handle/123456789/9472

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Hoyos, Keider	-
dc.contributor.author	Fernández, Jorge	-
dc.contributor.author	Martinez, Beatriz	-
dc.contributor.author	Henao, Óscar	-
dc.contributor.author	Orozco, Álvaro	-
dc.contributor.author	Daza, Genaro	-
dc.coverage.spatial	7029392	en_US
dc.date.accessioned	2021-07-13T14:43:52Z	-
dc.date.available	2021-07-13T14:43:52Z	-
dc.date.issued	2018	-
dc.identifier.citation	Hoyos K., Fernández J., Martinez B., Henao Ó., Orozco Á., Daza G. (2018) Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach. In: Hernández Heredia Y., Milián Núñez V., Ruiz Shulcloper J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science, vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_32	en_US
dc.identifier.uri	https://repositorio.uci.cu/jspui/handle/123456789/9472	-
dc.description.abstract	The imbalanced data refer to datasets where the number of samples in one class (majority class) is much higher than the other (minority class) causing biased classifiers in favor of the majority class. Currently, it is difficult to develop an effective model using machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. In this paper, we propose a Relevant Information based under-sampling (RIS) approach to improve the classification performance for the minority class by selecting the most relevant samples from the majority class as training data. Our RIS approach is based on a self-organizing principle of relevant information, which allows extracting the underlying structure of the majority class preserving different levels of detail of the original data with a smaller number of samples. Additionally, the RIS captures the data structure beyond second order statistics by estimating information theoretic measures which quantify the statistical structure of the majority class accurately, decreasing the consequences of the imbalanced classes distribution problem. We test our methodology on synthetic and real-world imbalanced datasets. Finally, we use a cross-validation scheme to quantify the classifier performance by evaluating the geometric mean. Results show that our proposal outperforms the state of the art methods for imbalanced class distributions regarding classification geometric mean, especially in highly imbalanced datasets.	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer	en_US
dc.subject	LEARNING ALGORITHMS	en_US
dc.subject	DATA PREPROCESSING	en_US
dc.subject	DATA VALIDATION	en_US
dc.title	Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach	en_US
dc.type	conferenceObject	en_US
dc.rights.holder	Universidad de las Ciencias Informáticas	en_US
dc.identifier.doi	https://doi.org/10.1007/978-3-030-01132-1_32	-
dc.source.initialpage	280	en_US
dc.source.endpage	287	en_US
dc.source.title	UCIENCIA 2018	en_US
dc.source.conferencetitle	UCIENCIA	en_US
Aparece en las colecciones:	UCIENCIA 2018

Ficheros en este ítem:

Fichero	Tamaño	Formato
A054.pdf	118.23 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem