Biopython Resources

Biopython - Advanced Sequence Operations

Quiz

In this chapter, we shall discuss some of the advanced sequence features provided by Biopython.

Complement and Reverse Complement

Nucleotide sequence can be reverse complemented to get new sequence. Also, the complemented sequence can be reverse complemented to get the original sequence. Biopython provides two methods to do this functionality − complement and reverse_complement. The code for this is given below −

>>> nucleotide = Seq('TCGAAGTCAGTC') 
>>> nucleotide.complement() 
Seq('AGCTTCAGTCAG') 
>>>

Here, the complement() method allows to complement a DNA or RNA sequence. The reverse_complement() method complements and reverses the resultant sequence from left to right. It is shown below −

>>> nucleotide.reverse_complement() 
Seq('GACTGACTTCGA')

Transcription

Transcription is the process of changing DNA sequence into RNA sequence. The actual biological transcription process is performing a reverse complement (TCAG CUGA) to get the mRNA considering the DNA as template strand. However, in bioinformatics and so in Biopython, we typically work directly with the coding strand and we can get the mRNA sequence by changing the letter T to U.

Simple example for the above is as follows −

>>> from Bio.Seq import Seq 
>>> from Bio.Seq import transcribe 
>>> dna_seq = Seq("ATGCCGATCGTAT") 
>>> transcribe(dna_seq) 
Seq('AUGCCGAUCGUAU') 
>>>

To reverse the transcription, T is changed to U as shown in the code below −

>>> rna_seq = transcribe(dna_seq) 
>>> rna_seq.back_transcribe() 
Seq('ATGCCGATCGTAT')

To get the DNA template strand, reverse_complement the back transcribed RNA as given below −

>>> rna_seq.back_transcribe().reverse_complement() 
Seq('ATACGATCGGCAT')

Translation

Translation is a process of translating RNA sequence to protein sequence. Consider a RNA sequence as shown below −

>>> rna_seq = Seq("AUGGCCAUUGUAAUG") 
>>> rna_seq 
Seq('AUGGCCAUUGUAAUG')

Now, apply translate() function to the code above −

>>> rna_seq.translate() 
Seq('MAIVM')

The above RNA sequence is simple. Consider RNA sequence, AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA and apply translate() −

>>> rna = Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA') 
>>> rna.translate() 
Seq('MAIVMGR*KGAR')

Here, the stop codons are indicated with an asterisk *.

It is possible in translate() method to stop at the first stop codon. To perform this, you can assign to_stop=True in translate() as follows −

>>> rna.translate(to_stop = True) 
Seq('MAIVMGR')

Here, the stop codon is not included in the resulting sequence because it does not contain one.

Translation Table

The Genetic Codes page of the NCBI provides full list of translation tables used by Biopython. Let us see an example for standard table to visualize the code −

>>> from Bio.Data import CodonTable 
>>> table = CodonTable.unambiguous_dna_by_name["Standard"] 
>>> print(table) 
Table 1 Standard, SGC0
   | T       | C       | A       | G       | 
 --+---------+---------+---------+---------+-- 
 T | TTT F   | TCT S   | TAT Y   | TGT C   | T
 T | TTC F   | TCC S   | TAC Y   | TGC C   | C
 T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
 T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G 
 --+---------+---------+---------+---------+--
 C | CTT L   | CCT P   | CAT H   | CGT R   | T
 C | CTC L   | CCC P   | CAC H   | CGC R   | C
 C | CTA L   | CCA P   | CAA Q   | CGA R   | A
 C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G 
 --+---------+---------+---------+---------+--
 A | ATT I   | ACT T   | AAT N   | AGT S   | T
 A | ATC I   | ACC T   | AAC N   | AGC S   | C
 A | ATA I   | ACA T   | AAA K   | AGA R   | A
 A | ATG M(s)| ACG T   | AAG K   | AGG R   | G 
 --+---------+---------+---------+---------+--
 G | GTT V   | GCT A   | GAT D   | GGT G   | T
 G | GTC V   | GCC A   | GAC D   | GGC G   | C
 G | GTA V   | GCA A   | GAA E   | GGA G   | A
 G | GTG V   | GCG A   | GAG E   | GGG G   | G 
 --+---------+---------+---------+---------+-- 
>>>

Biopython uses this table to translate the DNA to protein as well as to find the Stop codon.

Print Page