MetaCache: context-aware classification of metagenomic reads using minhashing. Results: We introduce MetaCache - a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache’s database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. Availability and implementation: MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache.
Keywords for this software
References in zbMATH (referenced in 1 article )
Showing result 1 of 1.
- Daniel Jünger; Robin Kobus; André Müller; Christian Hundt, Kai Xu; Weiguo Liu; Bertil Schmidt: WarpCore: A Library for fast Hash Tables on GPUs (2020) arXiv