The Computational Phylogenetics Group (CPG) develops software to better understand the evolutionary history between organisms and to test macroevolutionary hypotheses. It also aims to develop better models to analyse sequence data and quantitative models to estimate macroevolutionary patterns and processes.
IntroductionPhylogenetic trees are becoming a cornerstone of many areas of evolutionary biology. However, constructing optimal trees from a data matrix is still far from being an easy task. An important issue is to better understand what conditions are necessary to build an accurate tree.
In particular, there is a need for more effective algorithms to search the tree space. In the CPG, the focus is on data-driven approaches that can be extremely efficient when dealing with large phylogenetic trees. Furthermore, the question of which model(s) of evolution to use is important when analysing multigene matrices. However, little is known about the optimal strategy
to analyse such matrices, and this represents a large part of the research work.
Once a phylogenetic tree is reconstructed, the temporal dimension associated with this tree provides useful information for estimating the mode and tempo of organismsвЂ™ evolution. However, such investigations are affected by uncertainty in the phylogenetic treeвЂ™s reconstruction. Markov Chain Monte-Carlo (MCMC) methods are used to estimate the multiple plausible trees from the data before estimating and testing any model parameters associated with macroevolutionary processes. Alongside the extension of this MCMC framework, there is a need to develop quantitative models of speciation to allow more biologically realistic estimations of rates of evolution, and to test rates differences between lineages or through time. The application of such research extends into both the field of molecular evolution, through the analysis of gene duplication and loss, and into the field of macroevolution, through the estimation of rate of speciation and factors affecting species adaptation.These methods use DNA sequences as data and can be applied to any relevant group of organism. Any improvement and future development to the methods will provide biologists with refined and powerful tools to study the mode and tempo of micro and macroevolution.
Projects and Services
Grass phylogenetic database
The CPG is implementing an online database storing aligned DNA sequences for the plant family Poaceae. The database is being regularly updated by querying existing sequence databases such as EMBL/Genbank for newly added entries. Automated tools then filter the sequences in order to obtain the maximum number of DNA regions and species usable for phylogenetic study. Beside alignments, the database will store phylogenetic trees for each DNA region considered, as well as a tree for the combined DNA regions. This will represent the largest existing grass phylogenetic tree and provide an instantaneous view of grass evolutionary history, which will be available online. This will clearly identify which genes are available for which grass species, and provide a thorough examination of the phylogenetic information contained in each of these genes. It will help future sampling strategies aiming at building the grass вЂњTree of LifeвЂќ.
Phylogenetic softwareWe are currently developing three main software programs:
- The SuperTree software implements supertree tree reconstruction, which combines tree with non-overlapping terminal taxa to obtain comprehensive phylogenetic trees based on any kind of biological data. Several existing algorithms are implemented in a user-friendly interface.
- The MLtree software allows maximum likelihood estimation of character evolution based on a fixed given tree. Beside DNA models of evolution, it uses several models to analyse the evolution of morphological data.
- The Speciate software allows maximum likelihood estimation of speciation and extinction rates of lineages by taking into account phylogenetic uncertainty using an MCMC approach. Several models of speciation are being implementing in order to test macroevolutionary hypotheses such as testing of key innovations or detecting differential rate of diversification through time and lineages.
Websites for Further Information
Computational Phylogenetics Group: http://www2.unil.ch/phylo/