Biopython - Population Genetics



Population genetics plays an important role in evolution theory. It analyses the genetic difference between species as well as two or more individuals within the same species.

Biopython provides Bio.PopGen module for population genetics and mainly supports `GenePop, a popular genetics package developed by Michel Raymond and Francois Rousset.

A simple parser

Let us write a simple application to parse the GenePop format and understand the concept.

Download the genePop file provided by Biopython team in the link given below −https://raw.githubusercontent.com/biopython/biopython/master/Tests/PopGen/c3line.gen

Load the GenePop module using the below code snippet −

from Bio.PopGen import GenePop

Parse the file using GenePop.read method as below −

record = GenePop.read(open("c3line.gen"))

Show the loci and population information as given below −

>>> record.loci_list 
['136255903', '136257048', '136257636'] 
>>> record.pop_list 
['4', 'b3', '5'] 
>>> record.populations 
[[('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (3, 4), (2, 2)]), 
   ('3', [(3, 3), (4, 4), (2, 2)]), ('4', [(3, 3), (4, 3), (None, None)])], 
[('b1', [(None, None), (4, 4), (2, 2)]), ('b2', [(None, None), (4, 4), (2, 2)]), 
   ('b3', [(None, None), (4, 4), (2, 2)])], 
[('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (1, 4), (2, 2)]), 
   ('3', [(3, 2), (1, 1), (2, 2)]), ('4', 
   [(None, None), (4, 4), (2, 2)]), ('5', [(3, 3), (4, 4), (2, 2)])]] 
>>>

Here, there are three loci available in the file and three sets of population: First population has 4 records, second population has 3 records and third population has 5 records. record.populations shows all sets of population with alleles data for each locus.

Manipulate the GenePop file

Biopython provides options to remove locus and population data.

Remove a population set by position,

>>> record.remove_population(0) 
>>> record.populations 
[[('b1', [(None, None), (4, 4), (2, 2)]), 
   ('b2', [(None, None), (4, 4), (2, 2)]), 
   ('b3', [(None, None), (4, 4), (2, 2)])], 
   [('1', [(3, 3), (4, 4), (2, 2)]), 
   ('2', [(3, 3), (1, 4), (2, 2)]), 
   ('3', [(3, 2), (1, 1), (2, 2)]), 
   ('4', [(None, None), (4, 4), (2, 2)]), 
   ('5', [(3, 3), (4, 4), (2, 2)])]]
>>>

Remove a locus by position,

>>> record.remove_locus_by_position(0) 
>>> record.loci_list 
['136257048', '136257636'] 
>>> record.populations 
[[('b1', [(4, 4), (2, 2)]), ('b2', [(4, 4), (2, 2)]), ('b3', [(4, 4), (2, 2)])], 
   [('1', [(4, 4), (2, 2)]), ('2', [(1, 4), (2, 2)]), 
   ('3', [(1, 1), (2, 2)]), ('4', [(4, 4), (2, 2)]), ('5', [(4, 4), (2, 2)])]]
>>>

Remove a locus by name,

>>> record.remove_locus_by_name('136257636') >>> record.loci_list 
['136257048'] 
>>> record.populations 
[[('b1', [(4, 4)]), ('b2', [(4, 4)]), ('b3', [(4, 4)])], 
   [('1', [(4, 4)]), ('2', [(1, 4)]), 
   ('3', [(1, 1)]), ('4', [(4, 4)]), ('5', [(4, 4)])]]
>>>

Interface with GenePop Software

Biopython provides interfaces to interact with GenePop software and thereby exposes lot of functionality from it. Bio.PopGen.GenePop module is used for this purpose. One such easy to use interface is EasyController. Let us check how to parse GenePop file and do some analysis using EasyController.

First, install the GenePop software and place the installation folder in the system path. To get basic information about GenePop file, create a EasyController object and then call get_basic_info method as specified below −

>>> from Bio.PopGen.GenePop.EasyController import EasyController 
>>> ec = EasyController('c3line.gen') 
>>> print(ec.get_basic_info()) 
(['4', 'b3', '5'], ['136255903', '136257048', '136257636'])
>>>

Here, the first item is population list and second item is loci list.

To get all allele list of a particular locus, call get_alleles_all_pops method by passing locus name as specified below −

>>> allele_list = ec.get_alleles_all_pops("136255903") 
>>> print(allele_list) 
[2, 3]

To get allele list by specific population and locus, call get_alleles by passing locus name and population position as given below −

>>> allele_list = ec.get_alleles(0, "136255903") 
>>> print(allele_list) 
[] 
>>> allele_list = ec.get_alleles(1, "136255903") 
>>> print(allele_list) 
[] 
>>> allele_list = ec.get_alleles(2, "136255903") 
>>> print(allele_list) 
[2, 3] 
>>>

Similarly, EasyController exposes many functionalities: allele frequency, genotype frequency, multilocus F statistics, Hardy-Weinberg equilibrium, Linkage Disequilibrium, etc.

Advertisements