A clustering tool for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Models. We propose a new procedure for clustering nucleotide sequences based on the ”Laplacian Eigenmaps” and Gaussian Mixture modelling. This proposal is then applied to a set of 100 DNA sequences from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene of a collection of Platyhelminthes and Nematoda species. The resulting clusters are then shown to be consistent with the gene phylogenetic tree computed using a maximum likelihood approach. This comparison shows in particular that the clustering produced by the methodology combining Laplacian Eigenmaps with Gaussian Mixture models is coherent with the phylogeny as well as with the NCBI taxonomy. We also developed a Python package for this procedure which is available online.

Keywords for this software

Anything in here will be replaced on browsers that support the canvas element