Repérer les mots dans les images de documents

Maatoug, Yaaqob

Please use this identifier to cite or link to this item: https://dspace.univ-guelma.dz/jspui/handle/123456789/15010

Title:	Repérer les mots dans les images de documents
Authors:	Maatoug, Yaaqob
Keywords:	Document images, word spotting, features extraction, matching, clustering
Issue Date:	2023
Publisher:	University of Guelma
Abstract:	A large number of documents currently exist in various fields such as public administration, industry, scientific research, education, and many more. The exponential growth of digital documents has made their management and exploitation increasingly complex. Faced with this abundance of textual information, it has become essential to be able to quickly and efficiently access the knowledge contained in these documents. This is where word spotting comes in, locating and identifying words of interest within these massive datasets. Word spotting plays a crucial role in areas such as information retrieval, document classification, machine translation, and many other applications. It is in this context that this thesis falls. Our work aims to make a contribution to the task of finding words in images of digitized documents, with a focus on Arabic documents. The proposed approach integrates with the analytical methods which require the segmentation of the documents in words to carry out the identification. It encompasses several processing steps aimed at achieving our goals. The first step of our approach consists of a pre-processing of document images in order to improve their quality and reduce artefacts. Next, we proceed to segment the documents into lines of text and then into individual words. Once the words have been segmented, we extract a set of features from each word. This stage plays a key role in the representation of words and the ability to distinguish them from each other. We explored different families of descriptors in order to obtain a rich and discriminative representation of the words. Then the words are grouped into classes based on their similarity. Finally, the last module of our approach is the search module, where the user expresses his query in the form of an image of words, and the system compares it with the previously extracted and classified words to find the most relevant words. The experiments demonstrated promising performances, thus opening up new perspectives in the field of document analysis and recognition
URI:	http://dspace.univ-guelma.dz/jspui/handle/123456789/15010
Appears in Collections:	Master

Files in This Item:

File	Description	Size	Format
MAATOUG_YAAQOB_F5.pdf		3,82 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets