Résumé:
Effective communication between Deaf and hearing individuals remains a major societal chal- lenge, particularly in contexts where sign language is not understood by the general popula- tion. Sign languages are complete natural languages, yet the lack of shared linguistic knowl- edge continues to hinder accessibility and inclusion in vital domains such as education, health- care, and employment. In response to this issue, this thesis presents a deep learning-based system for real-time, bidirectional communication between Deaf and hearing users, using hand gesture sign language as a primary medium. The proposed system integrates computer vision, and 3D animation technologies to trans- late between sign language and spoken/written language. Three model architectures were implemented and evaluated: CNN-LSTM, MediaPipe-Bi-LSTM, and MediaPipe-GCN-BERT. While the MediaPipe-LSTM model achieved over 98% accuracy on isolated gesture recog- nition tasks, it exhibited limitations in handling longer sequences due to its memory-based structure. To overcome this, a graph-based approach was adopted, where spatial relationships between hand landmarks were modeled using Graph Convolutional Networks (GCNs), com- bined with BERT embeddings for semantic context. This resulted in improved generalization and performance on complex and continuous gestures. The system was deployed as a mobile application built with React Native and Expo, inte- grating real-time speech recognition, and sign-to-text translation. Experimental evaluations using cross-validation, confusion matrices, and Word Error Rate (WER) confirmed the robust- ness, accuracy, and usability of the platform in real-time scenarios. This work contributes a significant step toward accessible and inclusive communication technology for the Deaf and hard-of-hearing communities.