Please use this identifier to cite or link to this item: http://dspace.univ-guelma.dz/jspui/handle/123456789/12915
Title: Un schéma de pondération et de sélection des termes pertinents basé sur la distribution des fréquences des documents et des termes entre catégories
Authors: SIAFA, AYA
Keywords: sélection, terme, classification, texte, fréquence des termes, fréquence des documents
Issue Date: 2022
Publisher: université de guelma
Abstract: Feature selection plays an important role in text categorization. It has proven to be an effective and efficient way to prepare high dimensional data for data mining and text classification. Among the most popular selection metrics, we find : Gain Information (GI), Mutual Information (MI), Chi-square (Chi2) and Document Frequency (DF) which uses the document frequency distribution to compute the relevance of words to the class variable, without considering the intra-document word frequency distribution. Our main contribution is to propose a new approach called (TFDF) feature selection based on term frequency and document frequency at the class level. In the experiments, our proposed method is compared with existing metrics such as GI, MI, Chi2 and DF. The classifiers used to test the performance of the selection metrics are Support Vector Machine (SVM) and Naive Bayes (NB), which are the best performing ones at present. Experimental results show that our proposed method is superior to the results of existing metrics in the literature
URI: http://dspace.univ-guelma.dz/jspui/handle/123456789/12915
Appears in Collections:Master

Files in This Item:
File Description SizeFormat 
SIAFA_AYA_F5.pdf1,52 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.