Résumé:
The digitalization of historical documents is crucial for preserving valuable records for future
generations. However, obtaining high-quality digital versions of these documents is challenging
due to their often degraded state, characterized by low contrast and corrupted artifacts resulting
from aging, environmental factors, and handling. These degradations make it difficult to read and
utilize historical documents effectively. Recent research has focused on restoring and improving
the quality of these degraded documents to ensure they remain accessible and usable.
Accurate classification of noise types in documents is essential for applying the correct
restoration methods to change the documents to a noise free document. Different types of noise
require specific preprocessing techniques for effective removal, making precise noise
identification a critical step in the restoration process.
Our work aims to build a robust automated classification model that accurately identifies the type
of noise in document images. By leveraging the learning and generalization capabilities of
Convolutional Neural Networks (CNNs), our model classifies five main noise types ‗clean
image, paper damage, transparency, faded image, spots‘, accurate classification enables the
selection of appropriate noise removal method. This classification significantly improves the
efficiency and effectiveness of document restoration processes. The promising results from our
experiments highlight the potential of deep learning in enhancing the quality of digitized
historical documents and ensuring their long-term accessibility.