Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.uci.cu/jspui/handle/123456789/9472
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.contributor.authorHoyos, Keider-
dc.contributor.authorFernández, Jorge-
dc.contributor.authorMartinez, Beatriz-
dc.contributor.authorHenao, Óscar-
dc.contributor.authorOrozco, Álvaro-
dc.contributor.authorDaza, Genaro-
dc.coverage.spatial7029392en_US
dc.date.accessioned2021-07-13T14:43:52Z-
dc.date.available2021-07-13T14:43:52Z-
dc.date.issued2018-
dc.identifier.citationHoyos K., Fernández J., Martinez B., Henao Ó., Orozco Á., Daza G. (2018) Imbalanced Data Classification Using a Relevant Information-Based Sampling Approach. In: Hernández Heredia Y., Milián Núñez V., Ruiz Shulcloper J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2018. Lecture Notes in Computer Science, vol 11047. Springer, Cham. https://doi.org/10.1007/978-3-030-01132-1_32en_US
dc.identifier.urihttps://repositorio.uci.cu/jspui/handle/123456789/9472-
dc.description.abstractThe imbalanced data refer to datasets where the number of samples in one class (majority class) is much higher than the other (minority class) causing biased classifiers in favor of the majority class. Currently, it is difficult to develop an effective model using machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. In this paper, we propose a Relevant Information based under-sampling (RIS) approach to improve the classification performance for the minority class by selecting the most relevant samples from the majority class as training data. Our RIS approach is based on a self-organizing principle of relevant information, which allows extracting the underlying structure of the majority class preserving different levels of detail of the original data with a smaller number of samples. Additionally, the RIS captures the data structure beyond second order statistics by estimating information theoretic measures which quantify the statistical structure of the majority class accurately, decreasing the consequences of the imbalanced classes distribution problem. We test our methodology on synthetic and real-world imbalanced datasets. Finally, we use a cross-validation scheme to quantify the classifier performance by evaluating the geometric mean. Results show that our proposal outperforms the state of the art methods for imbalanced class distributions regarding classification geometric mean, especially in highly imbalanced datasets.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.subjectLEARNING ALGORITHMSen_US
dc.subjectDATA PREPROCESSINGen_US
dc.subjectDATA VALIDATIONen_US
dc.titleImbalanced Data Classification Using a Relevant Information-Based Sampling Approachen_US
dc.typeconferenceObjecten_US
dc.rights.holderUniversidad de las Ciencias Informáticasen_US
dc.identifier.doihttps://doi.org/10.1007/978-3-030-01132-1_32-
dc.source.initialpage280en_US
dc.source.endpage287en_US
dc.source.titleUCIENCIA 2018en_US
dc.source.conferencetitleUCIENCIAen_US
Aparece en las colecciones: UCIENCIA 2018

Ficheros en este ítem:
Fichero Tamaño Formato  
A054.pdf118.23 kBAdobe PDFVisualizar/Abrir


Los ítems del Repositorio están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.