Please use this identifier to cite or link to this item: http://dspace.univ-guelma.dz/jspui/handle/123456789/16486
Full metadata record
DC FieldValueLanguage
dc.contributor.authorADJABI, AKRAM-
dc.date.accessioned2024-12-03T07:40:41Z-
dc.date.available2024-12-03T07:40:41Z-
dc.date.issued2024-
dc.identifier.urihttp://dspace.univ-guelma.dz/jspui/handle/123456789/16486-
dc.description.abstractDetecting outlier documents is a critical task in various domains, including fraud detection, information retrieval, and anomaly detection. This project leverages Word2Vec Framework and the Word Mover’s Distance (WMD) to identify outlier documents in a corpus. Word2Vec is utilized to generate dense vector representations of words, capturing semantic similarities and contextual relationships. The WMD, which measures the dissimilarity between two text documents by computing the minimal cost to transform one document into another, is applied to these vector representations to assess document similarity. By analyzing the distribution of WMD scores across the document corpus, we can identify documents that deviate significantly from the norm, thus classifying them as outliers. This approach is advantageous due to its ability to handle the semantic richness of text and provide a nuanced measure of document similarity. The effectiveness of the proposed method is validated through experiments on benchmark datasets, demonstrating its potential in accurately identifying outlier documents.en_US
dc.language.isoenen_US
dc.publisherUniversity of Guelmaen_US
dc.subjectOutlier Detection, Word2Vec, Word Mover’s Distance (WMD), Document Similarity, Anomaly, Detection, Semantic Analysis, Text Mining, Vector Representations, Information Retrieval, Natural Language Processing (NLP).en_US
dc.titleA method based on word embedding and semantic similarity for detecting aberrant documentsen_US
dc.typeWorking Paperen_US
Appears in Collections:Master

Files in This Item:
File Description SizeFormat 
F5_8_ADJABI_AKRAM.pdf2,17 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.