Please use this identifier to cite or link to this item:
http://dspace.univ-guelma.dz/jspui/handle/123456789/12915
Title: | Un schéma de pondération et de sélection des termes pertinents basé sur la distribution des fréquences des documents et des termes entre catégories |
Authors: | SIAFA, AYA |
Keywords: | sélection, terme, classification, texte, fréquence des termes, fréquence des documents |
Issue Date: | 2022 |
Publisher: | université de guelma |
Abstract: | Feature selection plays an important role in text categorization. It has proven to be an effective and efficient way to prepare high dimensional data for data mining and text classification. Among the most popular selection metrics, we find : Gain Information (GI), Mutual Information (MI), Chi-square (Chi2) and Document Frequency (DF) which uses the document frequency distribution to compute the relevance of words to the class variable, without considering the intra-document word frequency distribution. Our main contribution is to propose a new approach called (TFDF) feature selection based on term frequency and document frequency at the class level. In the experiments, our proposed method is compared with existing metrics such as GI, MI, Chi2 and DF. The classifiers used to test the performance of the selection metrics are Support Vector Machine (SVM) and Naive Bayes (NB), which are the best performing ones at present. Experimental results show that our proposed method is superior to the results of existing metrics in the literature |
URI: | http://dspace.univ-guelma.dz/jspui/handle/123456789/12915 |
Appears in Collections: | Master |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
SIAFA_AYA_F5.pdf | 1,52 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.