Afficher la notice abrégée

dc.contributor.author Maatoug, Yaaqob
dc.date.accessioned 2023-11-23T12:52:47Z
dc.date.available 2023-11-23T12:52:47Z
dc.date.issued 2023
dc.identifier.uri http://dspace.univ-guelma.dz/jspui/handle/123456789/15010
dc.description.abstract A large number of documents currently exist in various fields such as public administration, industry, scientific research, education, and many more. The exponential growth of digital documents has made their management and exploitation increasingly complex. Faced with this abundance of textual information, it has become essential to be able to quickly and efficiently access the knowledge contained in these documents. This is where word spotting comes in, locating and identifying words of interest within these massive datasets. Word spotting plays a crucial role in areas such as information retrieval, document classification, machine translation, and many other applications. It is in this context that this thesis falls. Our work aims to make a contribution to the task of finding words in images of digitized documents, with a focus on Arabic documents. The proposed approach integrates with the analytical methods which require the segmentation of the documents in words to carry out the identification. It encompasses several processing steps aimed at achieving our goals. The first step of our approach consists of a pre-processing of document images in order to improve their quality and reduce artefacts. Next, we proceed to segment the documents into lines of text and then into individual words. Once the words have been segmented, we extract a set of features from each word. This stage plays a key role in the representation of words and the ability to distinguish them from each other. We explored different families of descriptors in order to obtain a rich and discriminative representation of the words. Then the words are grouped into classes based on their similarity. Finally, the last module of our approach is the search module, where the user expresses his query in the form of an image of words, and the system compares it with the previously extracted and classified words to find the most relevant words. The experiments demonstrated promising performances, thus opening up new perspectives in the field of document analysis and recognition en_US
dc.language.iso fr en_US
dc.publisher University of Guelma en_US
dc.subject Document images, word spotting, features extraction, matching, clustering en_US
dc.title Repérer les mots dans les images de documents en_US
dc.type Working Paper en_US


Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Chercher dans le dépôt


Recherche avancée

Parcourir

Mon compte