DNACLUST: accurate and efficient clustering of phylogenetic marker genes. Conclusions: We compare DNACLUST to two popular clustering tools: CD-HIT and UCLUST. We show that DNACLUST is about an order of magnitude faster than CD-HIT and UCLUST (exact mode) and comparable in speed to UCLUST (approximate mode). The performance of DNACLUST improves as the similarity threshold is increased (tight clusters) making it well suited for rapidly removing duplicates and near-duplicates from a dataset, thereby reducing the size of the data being analyzed through more elaborate approaches.
Keywords for this software
References in zbMATH (referenced in 4 articles )
Showing results 1 to 4 of 4.
- Giancarlo, Raffaele; Rombo, Simona E.; Utro, Filippo: DNA combinatorial messages and epigenomics: the case of chromatin organization and nucleosome occupancy in eukaryotic genomes (2019)
- Sahlin, Kristoffer; Medvedev, Paul: \textitDenovo clustering of long-read transcriptome data using a greedy, quality-value based algorithm (2019)
- Brubach, Brian; Ghurye, Jay; Pop, Mihai; Srinivasan, Aravind: Better greedy sequence clustering with fast banded alignment (2017)
- Ghodsi, Mohammadreza; Liu, Bo; Pop, Mihai: DNACLUST: accurate and efficient clustering of phylogenetic marker genes (2011) ioport