Repérer les mots dans les images de documents

Maatoug, Yaaqob

Accueil de DSpace
→
Faculté de mathématiques et de l'informatique et des sciences de la matière
→
Département de l'Informatique
→
Master
→
Voir le document

dc.contributor.author	Maatoug, Yaaqob
dc.date.accessioned	2023-11-23T12:52:47Z
dc.date.available	2023-11-23T12:52:47Z
dc.date.issued	2023
dc.identifier.uri	http://dspace.univ-guelma.dz/jspui/handle/123456789/15010
dc.description.abstract	A large number of documents currently exist in various fields such as public administration, industry, scientific research, education, and many more. The exponential growth of digital documents has made their management and exploitation increasingly complex. Faced with this abundance of textual information, it has become essential to be able to quickly and efficiently access the knowledge contained in these documents. This is where word spotting comes in, locating and identifying words of interest within these massive datasets. Word spotting plays a crucial role in areas such as information retrieval, document classification, machine translation, and many other applications. It is in this context that this thesis falls. Our work aims to make a contribution to the task of finding words in images of digitized documents, with a focus on Arabic documents. The proposed approach integrates with the analytical methods which require the segmentation of the documents in words to carry out the identification. It encompasses several processing steps aimed at achieving our goals. The first step of our approach consists of a pre-processing of document images in order to improve their quality and reduce artefacts. Next, we proceed to segment the documents into lines of text and then into individual words. Once the words have been segmented, we extract a set of features from each word. This stage plays a key role in the representation of words and the ability to distinguish them from each other. We explored different families of descriptors in order to obtain a rich and discriminative representation of the words. Then the words are grouped into classes based on their similarity. Finally, the last module of our approach is the search module, where the user expresses his query in the form of an image of words, and the system compares it with the previously extracted and classified words to find the most relevant words. The experiments demonstrated promising performances, thus opening up new perspectives in the field of document analysis and recognition	en_US
dc.language.iso	fr	en_US
dc.publisher	University of Guelma	en_US
dc.subject	Document images, word spotting, features extraction, matching, clustering	en_US
dc.title	Repérer les mots dans les images de documents	en_US
dc.type	Working Paper	en_US