The Proteome Informatics Group (PIG) is involved in software and database development for the benefit of the proteomics and the glycomics communities. These resources are made available through the ExPASy server. Software tools support experimental mass spectrometry data analysis, mainly for the detection of posttranslational modifications. Databases store knowledge of carbohydrate attached to proteins as well as protein-carbohydrate interactions.
A proteome is the protein complement expressed by a genome. A single genome gives rise to several proteomes reflecting responses to external and internal signals. Proteomics is the analysis of proteomes. It involves a range of experimental techniques, the most common being mass spectrometry (MS). Experimental data is processed in order to identify and quantify proteins, determine their cellular location, modifications, interactions and, ultimately, their function.
Posttranslational modifications (PTMs) are chemical modifications of amino acids following protein synthesis and form the repertoire of protein function modulators. Glycosylation, i.e., the addition of glycans/sugars to proteins and lipids represents not only the most abundant PTM, but also the most structurally diverse. This diversity arises from the many ways in which monosaccharides can be linked together during their synthesis by a complex enzymatic machinery that forms higher-order structures. Glycosylation has crucial roles in most physiological processes and diseases.
Projects and Services
PTM-oriented proteomics toolsEstimates of the PTM occurrence vary but they all suggest a gap between what is currently known and what remains to be discovered. Tandem mass spectrometry (MS/MS) is a technology of choice for the identification of PTMs and is often used in a high throughput set-up. The subsequent analysis of data depends heavily on bioinformatics. The standard approach to analysis of MS/MS spectra is to run Peptide Fragment Fingerprinting (PFF) software such as Sequest, Mascot, Phenyx or X!Tandem. These tools commonly identify 5-30% of the spectra in a dataset. An important part of the unidentified spectra is likely to match peptides carrying unexpected modifications. But, typical PFF tools share the limitation of imposing a user-defined list of potential modifications prior to the search. To circumvent this restriction and allow open modification search, we have proposed the combination of PFF search algorithms with spectral library search. This has resulted in the implementation of the QuickMod platform. QuickMod is implemented in Java based on mzJava an open-source Java library also developed at PIG. Several on-going research projects rely on the use and improvement of QuickMod, notably on the precise positioning of PTMs in peptides.
A heterogeneous range of bioinformatics databases and tools is now available on the web for glycomics studies. But as often pointed out, the landscape looks like “disconnected islands”. Connecting islands is a task that can subsequently support the discovery process in glycomics.
Previous experience in proteomics can benefit glycomics studies. In much the same way as the structure and function of proteins has been facilitated by the integration of proteomics knowledge into UniProtKB, the different “islands” need to be collected in a single virtual space and linked together to build knowledge of the structure and function of the glycosylation of proteins. PIG partakes in an international consortium driven by glycobiologists with the prospect of creating a proteomics equivalent for the integration of glycomics knowledge. The consortium released UniCarbKB that collects annotated glycan structural data combined with protein and tissue information. The development of UniCarbKB is tightly bound to mass spectrometric data and structural assignment based on fragmentation data stored in UniCarb-DB, a repository also created by the consortium. Mass spectrometry analysis software supporting glycan structure assignment is performed with a toolbox that substantially relies on the mzJava library and other tools developed at PIG such as GlycoDigest that simulates the exoglycosidase digestion of glycans.
Information on cellular interactions associated with changes in glycosylation structures is essential for the understanding of the functional role of sugars and needs to be collected in order to exploit the knowledge of these posttranslational additions to proteins. This task is undertaken with the expansion of SugarBindDB which collates the known binding sites of pathogens to sugars. These structural glycan epitopes occur on both proteins and lipids and add to the knowledgebase of glycosylation function and refine the structural data contained in UniCarbKB.