Résumé:
Today, almost all intelligent systems use machine learning methods, especially deep learning
algorithms, particularly in the fields of computer vision. Among these fields, object detection
and recognition are prominent. Convolutional Neural Networks (CNNs) have achieved great
success in object detection algorithms, providing speed and accuracy, especially in real-time
object detection. The goal of this work is to design and apply an object detection algorithm
using a deep learning model trained on well selected subclasses from benchmark databases,
and deploy it on a Raspberry Pi to assist visually impaired individuals by detecting objects in
the inner environment through sound interaction. To achieve this, we select one of the most
efficient detection models based on the trade-off between response time and accuracy. This
model is YOLOv8. We will attempt to retrain the chosen model on subsets of databases such
as MS-COCO, indoor dataset.
This choice of the model and the sub-classes, combined with the hyper-parameters, and the
strategy of training new weights consumes little computing time and conducts us to surpass a
huge problem of uploading our best weights on RPi4 module, and the result is a captured flow
of images to be used as an input in the detection/recognition process, in order to describe the
selected objects present in the indoor environment. This approach is implemented and tested
on a number of real life challenging conditions, and compared over several training options and
context in terms of classification accuracy quality and detection efficiency and response time in
real-world situations.