K-means & K-mers pour le regroupement et la comparaison de grands ensembles de séquences biologiques

BOUSMAT, YASSINE

Accueil de DSpace
→
Faculté de mathématiques et de l'informatique et des sciences de la matière
→
Département de l'Informatique
→
Master
→
Voir le document

K-means & K-mers pour le regroupement et la comparaison de grands ensembles de séquences biologiques

BOUSMAT, YASSINE

URI: http://dspace.univ-guelma.dz/jspui/handle/123456789/13427

Date: 2022

Résumé:

Bioinformatics is very important in extracting as much information as possible from biological data. Even though the old methods are useful, they become unable to measure the amount of biological data from ever-increasing high-throughput sequencing projects. One of the most important areas of bioinformatics is sequence grouping. In this paper, we focus on sequence grouping to help multiple sequence alignment algorithms in case large-scale biological sequences grows with the demand in computational biology. We present our clustering method based on the K-means algorithm which is guided by the k-mers related to the sequences to be aligned. Also, we integrate this method into a multiple alignment strategy to save time for execution without losing quality. We tested the approach on a multi-core processor, in addition to a set of Benchmarks in the literature review. We compared our results with those generated by the UClust clustering algorithm. The results show that our approach fails in terms of calculating time compared to UClust, while maintaining accuracy in all the tested Benchmarks.

Afficher la notice complète