BLAST++: a tool for BLASTing queries in batches. BLAST is the standard tool that molecular biologists use to search for sequence similarity in genomic (and protein) databases. It employs a brute force approach of comparing a query sequence against every database sequence - for each pair of the sequences to be matched, BLAST searches for short fixed-length word pairs (seeds) in the sequences and then extends them to higher-scoring regions. To search multiple queries, the basic approach is to run BLAST on each of the queries one at a time. This is clearly inefficient and fails to exploit common subsequences that the collection of queries may share. In this paper, we propose a new genome search tool, BLAST++, that allows multiple, say K, queries to be searched against a database concurrently. The design of BLAST++ is based on our observation that the seed searching step of BLAST is a bottleneck that consumes more than 80% of the total response time! BLAST++ essentially treats a collection of queries as a single virtual query so that the seed searching step needs to be performed only once for common subsequences. We implemented BLAST++ as an extension of the NCBI BLAST, and evaluated its performance. Our study shows that the results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but the single-process version of BLAST++ completes the processing in a much shorter time, about only 25% of the original single-process version of NCBI BLAST.

This software is also peer reviewed by journal TOMS.

Keywords for this software

Anything in here will be replaced on browsers that support the canvas element