Afficher la notice abrégée
dc.contributor.author |
Khaled Khodja, Anfel |
|
dc.date.accessioned |
2023-11-23T11:25:44Z |
|
dc.date.available |
2023-11-23T11:25:44Z |
|
dc.date.issued |
2023 |
|
dc.identifier.uri |
http://dspace.univ-guelma.dz/jspui/handle/123456789/15004 |
|
dc.description.abstract |
Feature selection is a crucial process in the pre-processing of data for machine learning.
Its aim is to reduce the feature space, speed up the learning process and improve the
performance of classification algorithms, while avoiding over-learning. Various statistical
methods, such as Information Gain (IG), Chi-squared test (Ch2), Improved Gini Index
(IGI), etc., have proved effective in finding the most representative attributes in text
corpora, using a reduced execution time compared with methods based on information
theory.
However, these methods can generate a large number of redundant attributes, which can
adversely affect the performance of classification algorithms. In this work, we aim to
eliminate this redundancy by measuring the correlation between attributes that have similar
or close IG scores. Correlation can be assessed using the mutual information between
attributes. Thus, attributes that are strongly related to the target variable (class) and weakly
correlated with the other attributes are considered to be the most informative. |
en_US |
dc.language.iso |
fr |
en_US |
dc.publisher |
University of Guelma |
en_US |
dc.subject |
selection, feature, mutual information, correlation, redundancy, classification, text. |
en_US |
dc.title |
Sélection et élimination des attributs redondants pour la classification des gros corpus textuels |
en_US |
dc.type |
Working Paper |
en_US |
Fichier(s) constituant ce document
Ce document figure dans la(les) collection(s) suivante(s)
Afficher la notice abrégée