Résumé:
Question-answering (QA) systems are crucial for retrieving information from large
databases. While traditional systems focus on structured relational databases, the
rise of NoSQL databases with a variety of query languages like MongoDB requires
adapting these systems. In this study, we propose a NoT5QL (NoSQL T5) model for
dynamic question answering of MongoDB based on fine-tuning of T5, a pre-trained
transformer model for natural language processing (NLP) tasks. The developed dy-
namic question-answering system is specifically tailored for generating queries for
MongoDB from natural language questions. To accomplish this task, a dataset called
"MongoQpedia" (Mongodb Query pedia) was created using diverse questions from
the Movies domain and annotated with MongoDB document-derived answers. The
construction of MongoQpedia is based on data augmentation via paraphrasing, back
translation, and named entity replacement techniques.
The evaluation of the NoT5QL model was performed through various metrics
such as BLEU and ROUGE, comparing fine-tuned T5 small and base models. This
provided insights into the impact of model size and complexity on MongoDB question-
answering capabilities. The results of these experiments demonstrate the effective-
ness of our approach in achieving high accuracy in answering questions related to
MongoDB databases.