Introduction to wEMBOSS
Friday May 11, 2007

Practicals: Answers
|
First run showdb to retrieve the list of available databases [output]. The SwissProt database can be accessed using the name "swissprot" . To retrieve
the sequence in fasta format of the Swiss-Prot entry P57727
use the the program
seqret: type swissprot:P57727 as input for the program [input]. The default output format is fasta. [output] The difference between entret and seqret
is that entret reads and writes the complete sequence entry together with the heading annotation (documentation) without attempting to reformat or interpret the data in any way [output]. seqret on the other hand will read in the entry data, determine which bit of it is the sequence, which is the description line and which is the feature table and will then write the sequence, description and features out in the way prescribed by the sequence format which has been requested for output. If you have a look at the Features FT lines of the file retrieved by entretyou will find five splice variants annotated (FT lines: VARSPLIC) which leads to isoforms B,T,D and E of the protein [output]. If you have a look at the Cross-References DR lines of the file retrieved by entret you will find 9 cross references to the EMBL database, which correspond to the mRNA sequence for this protein [output]. You can retrieve the DNA sequence corresponding to one of the EMBL entry, eg. embl:AF201380 with seqret [input] [output]. |
| |
The patmatmotifs program reports the occourrence of [input][output]:
1) The catalytic activity active site of the serine proteases (from the
trypsin family);
2) The low-density lipoprotein (LDL) receptor domain;
3) The scavenger receptor cysteine-rich (SRCR) domain;
In particular the patmatmotifs
reports the position of two (HIS and SER) of the three residues which are
part of the catalytic triad of the serine proteases of the trypsin family.
Please read the documentation for matching patterns for details about the catalytic triad of the active site. The occurrence of the domains is also annotated in the FT lines of the SwissProt entry (you already retrieved in the previous exercise with the entret program) [output]:
1) LDL receptor domain, from amino acid 72 to amino acid 108 of the sequence.
2) SRCR domain, from amino acid 109 to amino acid 205 of the sequence.
3) The peptidase domain containing the active site, from amino acid 217 to amino acid 449 of the sequence |
| |
You can use needle to perform a global parwise alignment and water
to do a local parwise alignment.
The local parwise alignment algorithm (water)
tries to find the best local alignment(s) between the two sequences which
in this case is the alignment between the second and the third domain
that the two proteins have in common [input] [output]. In fact, in the splice variant the first domain (the LDL receptor) is missing. The global parwise alignment algorithm (needle)
on the other end, align the entire sequences [input] [output].
|
| |
Use
for instance the EMBL sequence AF201380
and run the program restrict [input] [output] or remap [input] [output].
Type '6' in the input option 'Minimum recognition site length'. And in the
advance section type '1' and '2' respectively in the 'Minimum cuts per RE'
and 'Maximum cuts per RE' options. |
| |
|
entret reads and writes the complete sequence entry together with the heading
annotation. The coding sequence of the mRNA is reported in the FT lines
of the entry: from nucletotide 144 to nucleotide 1511 [output].
You can translate the coding sequence to the corresponding
protein product either with the program coderet
(extracts CDS automatically from the feature
tables) [input] [output] or with the program transeq
(by specifying in the input options the begin
and the end of the DNA sequence to be translated) [input] [output].
The application getorf
finds and extracts potential reading frames (in the 6 frames). Since it
is a predictive algorithm, errors can occour especially (as in this example)
in predicting the correct start of a reading frame (starting at position 3 instead of position 144) [input] [output].
|
| |
|
Use the program
eprimer3 to design the 6 best primers
for the embl sequence AF201380 (type
'6' in the advanced option 'Number of results to return').
If you check the advanced output option 'Explain flag' you will see that
393 primer pairs are considered OK by the program [input] [output].
To exclude the first and the last 12 base pairs of the sequence specify
the sub-region 12,2404 in which to pick the primer of the advanced option
'Included region(s)' [input] [output].
To design an internal oligo to detect one of the sequence variants (for instance the
isoform B which starts at amino acid 127)
the 'Target region(s)' advanced option can be specified. If one or more
targets is specified then a legal primer pair must flank at least one
of them [input] [output].
|
| |
Questions: L. Bordoli (Lorenza.Bordoli@unibas.ch) or L. Falquet (Laurent.Falquet@isb-sib.ch)