VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data. DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
Keywords for this software
References in zbMATH (referenced in 1 article )
Showing result 1 of 1.
- Carugo, Oliviero (ed.); Eisenhaber, Frank (ed.): Data mining techniques for the life sciences (2016)