The Computational Evolutionary Genomics Group (CEGG) carries out research in the field of comparative genomics and molecular evolution, focusing on multiple genome analysis to elucidate and quantify evolutionary processes that shape the repertoires of functional genomic elements. The research studies the functions of such elements on the basis of sequence variability among different species and within populations, and addresses questions concerning the robustness and evolvability of molecular systems.
Comparative genomics strives to understand “Nature’s experimentation”, which results in genomic sequence variability, by analysing the rapidly accumulating amounts of sequencing data to assess selection pressures on functional elements as well as to generate and to test hypotheses of the origin and evolution of biological complexity. In this framework, we employ multiple-genome analyses to elucidate and quantify evolutionary processes that shape the repertoires of protein-coding genes, non-coding RNAs (ncRNAs, e.g. microRNAs), and Conserved Non-Coding sequences (CNCs), as well as to investigate their putative functions. With a particular focus on medical and evolutionary questions, the group collaborates extensively with experimental and clinical laboratories and participates in international consortia, such as the VectorBase project for research on invertebrate vectors of human pathogens.
Projects and Services
Evolution of protein-coding and ncRNA gene repertoires
Orthology is the key concept in comparative genomics. It specifies the ancestral relationships among genes of different species and enables the tracing of their evolution, and speculation about their function, by extrapolating our knowledge from model organisms. This issue is particularly challenging at the scale of multiple genomes. We also study the evolution of larger orthologous genomic regions (synteny) to understand the dynamics of genome architectures and gene co-regulation.
Elucidating the functions of Conserved Non-Coding sequences
Animal genomes harbour repertoires of conserved-through-selection non-protein coding sequences as large as the repertoires of protein-coding genes. Our knowledge of these functional elements, which include non-coding RNA genes and a heterogeneous class of CNCs, remains very limited. Nevertheless, comparative genomics offers new ways to approach the problem of their identification and characterisation at the genomic scale. Although some ncRNAs are known to contribute to major cellular processes, and some CNCs have been shown to act as enhancers of gene expression, the complete picture remains to be determined.
Robustness and evolvability of molecular systems
The question of the origin and evolution of biological complexity is intriguing. A protein complex or pathway has additional systemic properties that can acquire specific functions that the individual components could not perform. However, how these system-level functions emerge and are selected through evolution is not clear. What defines the robustness of a molecular system that balances between the pressure to perform original functions and the ability to accommodate changes? Knowledge of protein complexes and pathways is growing rapidly, making it possible to trace the evolution of whole systems and of each of their components using comparative genomics. The ultimate goal is to generate quantitative models of such molecular systems that would have further predictive value.
The CEGG is developing:
- ImmunoDB: a database of insect immune-related gene families;
- miROrtho: microRNA gene identification approaches;
- miRMap: microRNA target prediction software;
- Newick Utilities: tools for high-throughput processing of phylogenetic trees;
- Large-scale genomic synteny identification resource;
- Metagenomics analysis work flows.