The Computational Cancer Genomics (CCG) group develops software to extract biological knowledge from nucleic acid sequences and functional genomics data, and applies computational techniques to problems of particular interest in cancer research.
The expression of human genes is tightly controlled by detailed regulatory instructions encoded in the genome. Normal regulation results in the development and maintenance of a healthy human being. Abnormal regulation leads to various diseases, in particular cancer. Despite tremendous efforts, gene control mechanisms are still poorly understood. By developing and maintaining bioinformatics resources for studying gene regulatory DNA sequences we hope to contribute to the understanding of gene regulation and gene regulatory diseases. This includes both the development of new algorithms to extract biological knowledge from nucleic acid sequences and functional genomics data, and the application of computational techniques to specific issues in cancer research.
Projects and Services
Methods for analyzing transcription factor binding sites and gene regulatory regions
In order to decode the regulatory regions of the human genome it would be extremely useful to have reliable tools to predict transcription factor binding sites from DNA sequence. Unfortunately, such tools do not yet exist. To fill this gap, we are collaborating with “wet-lab” biologists to develop and test new technologies such as SAGE/SELEX and protein-binding arrays to characterize the binding specificity of transcription factors with high accuracy. We are further developing new algorithms for analyzing whole genome chromatin-immunoprecipitation (ChIP) profiles and other mass genome annotation data. Increasingly, we are also taking into account sequence conservation patterns in whole genome alignments for the computational characterization of complex gene regulatory regions.
The Eukaryotic Promoter Database (EPD)
EPD is a collection of experimentally defined transcription initiation sites in higher life forms, which is extensively cross-referenced to other databases and journal articles. Created over 20 years ago, it has played a major role in the initial characterization of eukaryotic promoter elements and the development of promoter prediction algorithms for automatic genome annotation. Today, the promoters in EPD are defined by mass genome annotation data generated by high-throughput technologies such as CAGE.
The ChIP-Seq Analysis server
The ChIP-Seq web server provides access to a set of useful tools performing common ChIP-Seq data analysis tasks, including positional correlation analysis, peak detection, and genome partitioning into signal-rich and signal-poor regions. It is an open system designed to allow interoperability with other resources, in particular the motif discovery programs from the Signal Search Analysis (SSA) server. Since the Chip-Seq server uses speed-optimized algorithms and programs, response times are short. Input data can be uploaded in various formats, including BED and BAM. The server also provides access to a large collection of server resident data from landmark papers and large-scale epigenomics initiatives such as ENCODE.
The HTPSELEX database
This database serves to distribute raw data and processed results obtained with the recently developed SAGE/SELEX technology including ready-to-use computational models for predicting transcription factor binding sites. It also provides wet laboratory and data analysis protocols for biologists interested in applying this new technology to other transcription factors.
The Signals Search Analysis (SSA) server
This server provides access to new and old programs for the discovery and characterization of sequence motifs. It also serves as a hub for interconnecting the other resources maintained by our group. For instance, the SSA server allows us to analyse the frequency of transcription factor binding sites around transcription start sites of coordinately regulated genes. In such an analysis, the binding site definitions would be taken from HTPSELEX, the transcription start sites from EPD, and the gene sets selected on the basis of expression data.
Websites for Further Information
Computational Cancer Genomics Group: http://ccg.vital-it.ch/Eukaryotic Promoter Database: http://epd.vital-it.ch/
ChIP-Seq Analysis server: http://ccg.vital-it.ch/chipseq/
Signal Search Analysis Server: http://ccg.vital-it.ch/ssa/