What is the BLAST Local Alignment Algorithm?

Data Mining Database Data Structure

The BLAST algorithm was produced by Altschul, Gish, Miller, around 1990 at the National Center for Biotechnology Information (NCBI). BLAST is used to derive functional and evolutionary relationships among sequences and to help recognize members of gene families.

The NCBI website includes several common BLAST databases. As per their content, they are combined into nucleotide and protein databases. NCBI also supports specialized BLAST databases including the vector screening database, there are several genome databases for multiple organisms, and trace databases.

BLAST uses a heuristic approaches to discover the largest local alignments between a query sequence and a database. BLAST increase the complete speed of search by dividing the sequences to be compared into sequences of fragments (defined as words) and originally seeking matches among these words.

In BLAST, the words are treated as k-tuples. For DNA nucleotides, a word generally includes 11 bases (nucleotides), whereas for proteins, a word generally includes 3 amino acids. BLAST makes a hash table of neighborhood (i.e., nearly matching) words, while the threshold for “closeness” is set depends on statistics. It begins from exact matches to neighborhood words.

Because good alignments must include several close matches, it can use statistics to decide which matches are important. By hashing, it can discover matches in O (n)(linear) time. By reaching matches in both directions, the approach discover high quality alignments including several high-scoring and maximum segment pairs.

There are several versions and expansions of the BLAST algorithms. For instance, MEGABLAST, Discontinuous MEGABLAST, and BLASTN all can be used to recognize a nucleotide sequence. MEGABLAST is especially designed to efficiently find long alignments between very same sequences, and therefore is the best device to use to find the identical match to a query sequence.

One of the essential parameters guiding the sensitivity of BLAST searches is the length of the original words, or word size. The word size is flexible in BLASTN and can be decreased from the default value to a minimum of 7 to improve search sensitivity. Thus BLASTN is superior to MEGABLAST at discovering alignments to related nucleotide sequences from different organisms.

Standard protein-protein BLAST (BLASTP) is used for both recognizing a query amino acid sequence and for discovering same sequences in protein databases. Position-Specific Iterated (PSI)-BLAST is created for higher sensitive protein-similarity searches. It is beneficial for discovering very distantly related proteins.

Pattern-Hit Initiated (PHI)-BLAST can do a limited protein pattern search. It is created to search for proteins that include a pattern defined by the user and are same to the query sequence in the proximity of the pattern.

Ginni

Updated on: 17-Feb-2022

409 Views

Kickstart Your Career

Get certified by completing the course

Get Started