Please use this identifier to cite or link to this item:
http://dspace.univ-guelma.dz/jspui/handle/123456789/13228
Title: | Sélection des termes co-occurrents avec entropie minimale pour la Classification des textes |
Authors: | BENSSAADA, ARIDJ |
Keywords: | sélection , terme , co-occurence , entropie , texte , classification . |
Issue Date: | 2022 |
Publisher: | université de guelma |
Abstract: | Feature selection, as a dimensionality reduction technique, aims at selecting a small subset of the relevant features from the original ones by removing the irrele vant, redundant or noisy ones. Feature selection generally leads to better learning performance, i.e. higher learning accuracy, lower computational cost and better mo del interpretation. Feature selection methods such as Information Gain (IG), Mutual Information (MI) and Chi-square (Chi2) are statistical methods based on document frequency, but they do not take into account the frequency of terms within docu ments, nor do they consider their semantics. Based on the idea that terms that frequently co-occur may have a common se mantics and thus a high discrimination capacity compared to isolated terms, we propose a feature selection method for text classification considering two measures : term co-occurrence frequency and term entropy, where a term that frequently co occurs with other terms and leads to minimize the uncertainty (entropy) of the class variable is considered relevant. The performance of our method is compared to the four most commonly used se lection metrics : Information Gain (IG), Mutual Information (MI), Chi-square (Chi2) and Document-Frequency (DF), using two classifiersNaïve Bayes (NB) and Support Vector Machine (SVM) and three datatsets |
URI: | http://dspace.univ-guelma.dz/jspui/handle/123456789/13228 |
Appears in Collections: | Master |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
BENSSAADA_ARIDJ_F5.pdf | 1,58 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.