Home

Finding and analysing eucaryotic promoters

Friday November 03, 2006

Practical for promoter analysis

 

The aim of this practical is to provide a superficial insight into some of presently available web-based tools for promoter analysis. Due to limited time, we will not be able to cover any of these tools in appropriate depth, but we will only touch on certain aspects. As we treat some of the tools as "black boxes", it is important to keep track of your approaches. I would recommend to note any parameter settings altered from default and save the results. You might need some of the results in subsequent steps of the practical !

Typical’ approach:

  • Genes with common expression pattern (DNA array experiment)
  • Collection of promoter sequences
  • Motif discovery
  • Comparison with ‚known‘ motifs
  • Evolutionary conservation 

Task:

a) you have a list of Affymetrix probeIDs (AFFY_HG_U133Av2.IDlist), which i.e. display a common regulation. As a first step, you would like to identify transcripts which most likely generated the hybridization signals. Please note that Affymetrix probes are initially designed to detect a single transcript species. In the time the DNA chip is released to the market, new genome assembly releases sometimes ‘spoil’ the initial design and novel transcripts emerge as new potential targets or sequence updates eliminate initial transcripts as targets.

This exercise allows also to familiarize with the ENSEMBL BioMart datamining system.
http://www.ensembl.org/

hints:

Data mining [BioMart] (in the left panel)

           - search the latest human genome
           - limit your search on a ‘ID list’ from ‘affy hg u133a 2’
           - as an output, you would like to have a list of corresponding ENSEMBL gene & transcript identifiers, HGNC gene Symbols, chromosomes and start and
             end positions, and the AFFY HG U133Av2 Microarray Attributes
           - try also to directly extract promoter sequences in the range -3kb to 100bp by selecting ‘sequences’ in the ‘Attribute Page’

b) summarize your experiences with automatic sequence extraction, and eventually try to understand one or two of the encountered problems?

c) Try also individual sequence retrieval using ‘Export data’, i.e. by extracting 3kb upstream region of a gene.

 

Motif discovery

Aim: get an impression what current motif discovery programs can deliver

Example data set: reasonably well characterized set of glucocorticoid receptor target genes:

  • regulation of gene expression (Rogatsky et al. 2003)
  • binding sites as determined by ChIP (Wang et al. 2004)
  • consensus GR-binding site (ACANNNTGTTNT) (Chen et al. 2003 or by SELEX Nelson et al. 1999; Transfac:M00955/V$GR01)

Search for the glucocorticoid receptor binding sites in a (small) set of promoter sequences of eight glucocorticoid-induced genes (sequence: prom_gc-ind.fasta . Save it using CTRL-Click). Wang and colleagues could confirm glucocorticoid receptor binding in five of them. 

MEME:

MEME is one of the more widely used motif discovery algorithms. As MEME requires a certain amount of computing time, we will look at a pre-computed motif discovery:
results of MEME search: prom_gc-ind_meme.html

input parameters
sequence: prom_gc-ind.seq
Any number of motif repetitions
number of motifs: 20
Maximum width of motif: 24

Did MEME predict any motifs resembling the consensus GR binding site?

Motif discovery jobs can be submitted at http://meme.nbcr.net/meme/meme.html
(input limited to 60’000 characters, and results sent to e-mail address. Indeed, it may take up to several hours depending on the server load)

TOUCAN2:

TOUCAN2 is a quite recent workbench for regulatory sequence analysis, especially for detecting significant transcription factor binding sites across species, and for detecting cis-regulatory modules (combinations of binding sites) in sets of coexpressed/ coregulated genes.

Documentation (http://homepage.univie.ac.at/herbert.mayer/MainGEN.html#TOUCAN) is still somewhat rudimentary, so I would suggest to use default parameters if not indicated otherwise.

http://homes.esat.kuleuven.be/~saerts/software/toucan.php

a) -> Launch Now
         Accept the certificate
b) Upload sequences of above examples:
         File-> Load Seq
         [Get_Seq-> from ENSEMBL : seems not (yet?) to work properly]
c) Annotate sequences with matches to consensus GR-binding site:
         Motifs-> Consensus match

ACA[A,C,G,T]{3}TGTT[A,C,G,T]T

d) Annotate sequences with a matrix derived from a SELEX exp. As collected in Transfac:

There is a choice of 2 motif mapping programs:
MotifScanner:     Screen sequences with known motif using probabilistic sequence model.
MotifLocator:      Screen sequences with known motif based on classical position-weight matrix scoring scheme


         Motifs-> program of choice
         ‘Background model’: GET -> ‘human DBTSS promoters (0)’
         Browse PWD Database: M00955.mtrx

(Continuation motif discovery:)

e) MotifSampler is another very popular motif discovery program. It is based on the Gibbs sampling algorithm and has thus a random component
          Motifs -> MotifSampler

By specifying ‘motif runs’, you can repeat a MotifSampler search with exactly identical parameters for a number of times. What do you observe by comparing the motifs found in each of the searches.?

Hint: You obtain a context menu if clicking on a motif in the feature list while holding the ctrl-key.

 

Comparison with known motifs

‘Professional’ version of TRANSFAC:

http://sib-pc17.unil.ch/biobase-cgi/biobase/transfac/10.3/bin/start.cgi

user: cigprom (only active on Nov-3-2006)

pwd: CIG06

i) Map all Transfac matrices (MATCHTM) or transfac sites (PATCHTM) onto a sequence of your choice from the above sequence set.
ii) Try out different modes of cut-off selections (false negatives, false positives) and keep record of the number of matches, which are reported at the end of the output.
iii) Do any of the motifs found by MEME or MotifSampler correspond to sites annotated by TRANSFAC? Would you consider an eventual match significant?

JASPAR:

http://mordor.cgb.ki.se/cgi-bin/jaspar2005/jaspar_db.pl

compare the motifs mapped by JASPAR to those mapped by TRANSFAC

 

Evolutionary conservation using the UCSC browser

Check if some motif instances evolutionary conserved?

 

 

Latest update 2006-11-03
Valid HTML 4.01 Transitional   Valid CSS!