Thèses en ligne de l'université 8 Mai 1945 Guelma

Sélection et élimination des attributs redondants pour la classification des gros corpus textuels

Afficher la notice abrégée

dc.contributor.author Khaled Khodja, Anfel
dc.date.accessioned 2023-11-23T11:25:44Z
dc.date.available 2023-11-23T11:25:44Z
dc.date.issued 2023
dc.identifier.uri http://dspace.univ-guelma.dz/jspui/handle/123456789/15004
dc.description.abstract Feature selection is a crucial process in the pre-processing of data for machine learning. Its aim is to reduce the feature space, speed up the learning process and improve the performance of classification algorithms, while avoiding over-learning. Various statistical methods, such as Information Gain (IG), Chi-squared test (Ch2), Improved Gini Index (IGI), etc., have proved effective in finding the most representative attributes in text corpora, using a reduced execution time compared with methods based on information theory. However, these methods can generate a large number of redundant attributes, which can adversely affect the performance of classification algorithms. In this work, we aim to eliminate this redundancy by measuring the correlation between attributes that have similar or close IG scores. Correlation can be assessed using the mutual information between attributes. Thus, attributes that are strongly related to the target variable (class) and weakly correlated with the other attributes are considered to be the most informative. en_US
dc.language.iso fr en_US
dc.publisher University of Guelma en_US
dc.subject selection, feature, mutual information, correlation, redundancy, classification, text. en_US
dc.title Sélection et élimination des attributs redondants pour la classification des gros corpus textuels en_US
dc.type Working Paper en_US


Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Chercher dans le dépôt


Recherche avancée

Parcourir

Mon compte