Por favor, use este identificador para citar o enlazar este ítem:
https://repositorio.uci.cu/jspui/handle/123456789/9472
Registro completo de metadatos
Campo DC | Valor | Lengua/Idioma |
---|---|---|
dc.contributor.author | Hoyos, Keider | - |
dc.contributor.author | Fernández, Jorge | - |
dc.contributor.author | Martinez, Beatriz | - |
dc.contributor.author | Henao, Óscar | - |
dc.contributor.author | Orozco, Álvaro | - |
dc.contributor.author | Daza, Genaro | - |
dc.coverage.spatial | 7029392 | en_US |
dc.date.accessioned | 2021-07-13T14:43:52Z | - |
dc.date.available | 2021-07-13T14:43:52Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Hoyos K., Fernández J., Martinez B., Henao Ó., Orozco Á., Daza G. (2018) Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach. In: Hernández Heredia Y., Milián Núñez V., Ruiz Shulcloper J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science, vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_32 | en_US |
dc.identifier.uri | https://repositorio.uci.cu/jspui/handle/123456789/9472 | - |
dc.description.abstract | The imbalanced data refer to datasets where the number of samples in one class (majority class) is much higher than the other (minority class) causing biased classifiers in favor of the majority class. Currently, it is difficult to develop an effective model using machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. In this paper, we propose a Relevant Information based under-sampling (RIS) approach to improve the classification performance for the minority class by selecting the most relevant samples from the majority class as training data. Our RIS approach is based on a self-organizing principle of relevant information, which allows extracting the underlying structure of the majority class preserving different levels of detail of the original data with a smaller number of samples. Additionally, the RIS captures the data structure beyond second order statistics by estimating information theoretic measures which quantify the statistical structure of the majority class accurately, decreasing the consequences of the imbalanced classes distribution problem. We test our methodology on synthetic and real-world imbalanced datasets. Finally, we use a cross-validation scheme to quantify the classifier performance by evaluating the geometric mean. Results show that our proposal outperforms the state of the art methods for imbalanced class distributions regarding classification geometric mean, especially in highly imbalanced datasets. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Springer | en_US |
dc.subject | LEARNING ALGORITHMS | en_US |
dc.subject | DATA PREPROCESSING | en_US |
dc.subject | DATA VALIDATION | en_US |
dc.title | Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach | en_US |
dc.type | conferenceObject | en_US |
dc.rights.holder | Universidad de las Ciencias Informáticas | en_US |
dc.identifier.doi | https://doi.org/10.1007/978-3-030-01132-1_32 | - |
dc.source.initialpage | 280 | en_US |
dc.source.endpage | 287 | en_US |
dc.source.title | UCIENCIA 2018 | en_US |
dc.source.conferencetitle | UCIENCIA | en_US |
Aparece en las colecciones: | UCIENCIA 2018 |
Ficheros en este ítem:
Fichero | Tamaño | Formato | |
---|---|---|---|
A054.pdf | 118.23 kB | Adobe PDF | Visualizar/Abrir |
Los ítems del Repositorio están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.