MetaCache
MetaCache: context-aware classification of metagenomic reads using minhashing. Results: We introduce MetaCache - a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache’s database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. Availability and implementation: MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.
Keywords for this software
References in zbMATH (referenced in 1 article )
Showing result 1 of 1.
Sorted by year (