scientific activity
New high-sensitivity software for searching massive DNA databases

Anna Kaznadzey, a Junior Researcher at the Kharkevich Institute for Information Transmission Problems of the Russian Academy of Sciences, took part in working on NSimScan (Nucleotide Similarity Scanner). The computer program is specialized for searching massive DNA databases for distant similarities. Software description is published in Bioinformatics journal by Oxford University Press.

Targeted applications include phylogenomics, comparative and functional studies of non-coding sequences, contamination detection, etc. It is best used for searching large DNA databases for hits with 55-90% similarity.

NSimScan uses a pipeline of filters of increasing computational complexity. The speed is achieved by query aggregation, multi-tier filtering, use of optimized bitwise operations in alignment computing, and by avoidance of dynamic programming, which makes the entire hit evaluation a linear operation with respect to the alignment lengths.

«Standard DNA search tools include BLAST, MegaBLAST, BLAT, usearch, ssearch, etc. Faster tools are usually not as sensitive, while sensitive and accurate tools run much slower - sometimes weeks instead of hours. NSimScan works with the speed of the fastest ones (BLAT and MegaBLAST) or faster, with significantly less error rate, and demonstrates sensitivity and accuracy at the levels of the the most sensitive tools in the field (BLAST and ssearch)». – Anna Kaznadzey says.


NSimScan is available at https://github.com/abadona/qsimscan as a part of QSimScan package. It is implemented in C++, distributed under MIT license and supported on Linux, OS X and Windows (with cygwin).


31.03.2016 | Efimova Maria


