Summary
The Computational Cancer Genomics (CCG) group develops software to extract biological knowledge from nucleic acid sequences and functional genomics data, and applies computational techniques to problems of particular interest in cancer research.
Introduction
The expression of human genes is tightly controlled by detailed regulatory instructions encoded in the genome. Normal regulation results in the development and maintenance of a healthy human being. Abnormal regulation leads to various diseases, in particular cancer. Despite tremendous efforts, gene control mechanisms are still poorly understood. By developing and maintaining bioinformatics resources for studying gene regulatory DNA sequences we hope to contribute to the understanding of gene regulation and gene regulatory diseases. This includes both the development of new algorithms to extract biological knowledge from nucleic acid sequences and functional genomics data, and the application of computational techniques to specific issues in cancer research.
Projects and Services
Methods for analyzing transcription factor binding sites and gene regulatory regions
In order to decode the regulatory regions of the human genome it would be extremely useful to have reliable tools to predict transcription factor binding sites from DNA sequence. Unfortunately, such tools do not yet exist. To fill this gap, we are collaborating with “wet-lab” biologists to develop and test new technologies such as SAGE/SELEX and protein-binding arrays to characterize the binding specificity of transcription factors with high accuracy. We are further developing new algorithms for analyzing whole genome chromatin-immunoprecipitation (ChIP) profiles and other mass genome annotation data. Increasingly, we are also taking into account sequence conservation patterns in whole genome alignments for the computational characterization of complex gene regulatory regions.
The Eukaryotic Promoter Database (EPD)
EPD is a collection of experimentally defined transcription initiation sites in higher life forms, which is extensively cross-referenced to other databases and journal articles. Created over 20 years ago, it has played a major role in the initial characterization of eukaryotic promoter elements and the development of promoter prediction algorithms for automatic genome annotation. Today, the promoters in EPD are defined by mass genome annotation data generated by high-throughput technologies such as CAGE.
The CleanEx gene expression database
CleanEx provides access to public gene expression data via officially approved gene symbols. It also attempts to represent expression profiles obtained by different technologies in a standardized format and numerical representation, allowing for cross-data set comparisons. Today, most data in CleanEx are imported from the data repository GEO at NCBI. We are currently working on interfaces enabling users to identify, merge, and download gene expression profiles corresponding to their area of interest. CleanEx also provides high-grade feature annotations of major technical platforms such as Affymetrix GeneChip® or SAGE.
The HTPSELEX database
This database serves to distribute raw data and processed results obtained with the recently developed SAGE/SELEX technology including ready-to-use computational models for predicting transcription factor binding sites. It also provides wet laboratory and data analysis protocols for biologists interested in applying this new technology to other transcription factors.
The Signals Search Analysis (SSA) server
This server provides access to new and old programs for the discovery and characterization of sequence motifs. It also serves as a hub for interconnecting the other resources maintained by our group. For instance, the SSA server allows us to analyse the frequency of transcription factor binding sites around transcription start sites of coordinately regulated genes. In such an analysis, the binding site definitions would be taken from HTPSELEX, the transcription start sites from EPD, and the gene sets selected on the basis of expression data provided by CleanEx.
Websites for Further Information
Computational Cancer Genomics Group http://www.isrec.isb-sib.ch/
Eukaryotic Promoter Database http://www.epd.isb-sib.ch/
CleanEx http://www.cleanex.isb-sib.ch/
HTPSELEX http://www.isrec.isb-sib.ch/htpselex/
Signal Search Analysis Server http://www.isrec.isb-sib.ch/ssa/






