New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures

Published on Nov 29, 2012in Nucleic Acids Research16.971
· DOI :10.1093/NAR/GKS1211
Ian Sillitoe36
Estimated H-index: 36
(EMBL-EBI: European Bioinformatics Institute),
Alison L. Cuff13
Estimated H-index: 13
(EMBL-EBI: European Bioinformatics Institute)
+ 10 AuthorsChristine A. Orengo85
Estimated H-index: 85
(EMBL-EBI: European Bioinformatics Institute)
Sources
Abstract
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
ūüďĖ Papers frequently viewed together
References26
Newest
#1Ralph B Pethica (UoB: University of Bristol)H-Index: 5
#2Michael Levitt (Stanford University)H-Index: 113
Last. Julian Gough (UoB: University of Bristol)H-Index: 62
view all 3 authors...
Background SCOP is a hierarchical domain classification system for proteins of known structure. The superfamily level has a clear definition: Protein domains belong to the same superfamily if there is structural, functional and sequence evidence for a common evolutionary ancestor. Superfamilies are sub-classified into families, however, there is not such a clear basis for the family level groupings. Do SCOP families group together domains with sequence similarity, do they group domains with simi...
Source
#1Manan M. Mehta (Rice University)H-Index: 9
#2Shirley Liu (Rice University)H-Index: 9
Last. Jonathan J. Silberg (Rice University)H-Index: 26
view all 3 authors...
A simple approach for creating libraries of circularly permuted proteins is described that is called PERMutation Using Transposase Engineering (PERMUTE). In PERMUTE, the transposase MuA is used to randomly insert a minitransposon that can function as a protein expression vector into a plasmid that contains the open reading frame (ORF) being permuted. A library of vectors that express different permuted variants of the ORF-encoded protein is created by: (i) using bacteria to select for target vec...
Source
#1Judith A. BlakeH-Index: 69
#2Mary E. DolanH-Index: 9
Last. Fiona M. McCarthy (MSU: Mississippi State University)H-Index: 30
view all 147 authors...
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the ...
Source
#1Jonathan G. Lees (UCL: University College London)H-Index: 27
#2Corin Yeats (UCL: University College London)H-Index: 30
Last. Christine A. Orengo (UCL: University College London)H-Index: 85
view all 7 authors...
Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includ...
Source
#1Gemma L. Holliday (UCSF: University of California, San Francisco)H-Index: 25
#2Claudia Andreini (UCSF: University of California, San Francisco)H-Index: 26
Last. William R. Pearson (UCSF: University of California, San Francisco)H-Index: 53
view all 7 authors...
MACiE (which stands for Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/. This article presents the release of Version 3 of MACiE, which not only extends the dataset to 335 entries, covering 182 of the EC sub-subclasses with a crystal structure available (~90%), but also incorporates greater chemical and structural detail. This version of MACiE represents a shift in emphas...
Source
#1Nicholas Furnham (EMBL-EBI: European Bioinformatics Institute)H-Index: 26
#2Ian Sillitoe (EMBL-EBI: European Bioinformatics Institute)H-Index: 4
Last. Janet M. Thornton (EMBL-EBI: European Bioinformatics Institute)H-Index: 146
view all 8 authors...
: FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also...
Source
#1Alison L. Cuff (EMBL-EBI: European Bioinformatics Institute)H-Index: 13
#2Ian Sillitoe (EMBL-EBI: European Bioinformatics Institute)H-Index: 4
Last. Christine A. Orengo (EMBL-EBI: European Bioinformatics Institute)H-Index: 85
view all 10 authors...
CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional ...
Source
#1Antonina Andreeva (LMB: Laboratory of Molecular Biology)H-Index: 24
#2Alexey G. Murzin (LMB: Laboratory of Molecular Biology)H-Index: 50
During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed ő≤-sheet architectures...
Source
#1Corin YeatsH-Index: 30
#2Oliver C. RedfernH-Index: 23
Last. Christine A. OrengoH-Index: 85
view all 3 authors...
Motivation: Accurate prediction of the domain content and arrangement in multi-domain proteins (which make up >65% of the large-scale protein databases) provides a valuable tool for function prediction, comparative genomics and studies of molecular evolution. However, scanning a multi-domain protein against a database of domain sequence profiles can often produce conflicting and overlapping matches. We have developed a novel method that employs heaviest weighted clique-finding (HCF), which we sh...
Source
#1Jonathan G. Lees (UCL: University College London)H-Index: 27
#2Corin YeatsH-Index: 30
Last. Christine A. OrengoH-Index: 85
view all 5 authors...
Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting ...
Source
Cited By186
Newest
#7Stuart Anderson (CSIRO: Commonwealth Scientific and Industrial Research Organisation)H-Index: 7
We modeled 3D structures of all SARS-CoV-2 proteins, generating 2,060 models that span 69% of the viral proteome and provide details not available elsewhere. We found that ňú6% of the proteome mimicked human proteins, while ňú7% was implicated in hijacking mechanisms that reverse post-translational modifications, block host translation, and disable host defenses; a further ňú29% self-assembled into heteromeric states that provided insight into how the viral replication and translation complex forms...
Source
#1Maria Littmann (TUM: Technische Universit√§t M√ľnchen)H-Index: 5
#2Nicola Bordin (UCL: University College London)H-Index: 7
Last. Burkhard Rost (TUM: Technische Universit√§t M√ľnchen)H-Index: 100
view all 7 authors...
MOTIVATION Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be "pure", i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22,830 of 203,639) contain EC annotations and of those, 7% (1,526 of 22,830) have inconsisten...
Source
#1Nicholas WaglechnerH-Index: 24
#2Elizabeth Culp (McMaster University)H-Index: 6
Last. Gerard D. Wright (McMaster University)H-Index: 90
view all 3 authors...
As the spread of antibiotic resistance threatens our ability to treat infections, avoiding the return of a preantibiotic era requires the discovery of new drugs. While therapeutic use of antibiotics followed by the inevitable selection of resistance is a modern phenomenon, these molecules and the genetic determinants of resistance were in use by environmental microbes long before humans discovered them. In this review, we discuss evidence that antibiotics and resistance were present in the envir...
Source
#1Maria Littmann (TUM: Technische Universit√§t M√ľnchen)H-Index: 5
#2Nicola Bordin (UCL: University College London)H-Index: 7
Last. Burkhard Rost (TUM: Technische Universit√§t M√ľnchen)H-Index: 100
view all 5 authors...
Motivation: Classifying proteins into functional families can improve our understanding of a protein9s function and can allow transferring annotations within the same family. Toward this end, functional families need to be "pure", i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function, based on differentially conserved residues. 11% of all FunFams (22,830 of 203,639) also contain ...
Source
#1Imran ZafarH-Index: 2
Last. Reham Medhat Ishneiwra (Islamic University of Gaza)
view all 8 authors...
The area of bioinformatics emerged as a method to promote biological research more than two decades ago. Bioinformatics is a multidisciplinary field of study composed of biology, mathematics, and computer science. It has emerged as a smoothing biological science instrument and saves findings as far as possible. Every day, enormous biological data are accessible to the science community by producing high throughput sequencing (HTS) technologies. Bioinformatics and algorithm approaches are now bei...
Source
#1Atikur Rahman (Jessore University of Science & Technology)
Last. Mohammad Uzzal Hossain (National Institute of Biotechnology)H-Index: 9
view all 5 authors...
Chloroflexus aurantiacus J-10-fI strain is a thermophilic gram-negative bacterium that possesses many proteins in its genome; some are considered as hypothetical proteins. The use of bioinformatics tools can assist in understanding this organism through structural and functional annotation. Our study aimed to assign structure and function to an ecologically important hypothetical protein present in the bacterial genome. To analyze the hypothetical protein (WP_012259469.1), we used an in silico a...
Source
#1Valery O Polyanovsky (EIMB: Engelhardt Institute of Molecular Biology)H-Index: 2
#2Alexander Lifanov (EIMB: Engelhardt Institute of Molecular Biology)H-Index: 1
Last. Vladimir G. Tumanyan (EIMB: Engelhardt Institute of Molecular Biology)H-Index: 16
view all 4 authors...
The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conduc...
Source
#1Daniel A. NissleyH-Index: 7
#2Quyen V. Vu (PAN: Polish Academy of Sciences)H-Index: 2
Last. Edward P. O'BrienH-Index: 30
view all 7 authors...
The ejection of nascent proteins out of the ribosome exit tunnel after their covalent bond to transfer-RNA is broken has not been experimentally studied due to challenges in sample preparation. Her...
Source
: Protein-protein interactions (PPIs) control all functions and physiological states of the cell. Identification and understanding of novel PPIs would facilitate the discovery of new biological models and therapeutic targets for clinical intervention. Numerous resources and PPI databases have been developed to define a global interactome through the PPI data mining, curation, and integration of different types of experimental evidence obtained with various methods in different model systems. On ...
Source
#1Alberto Pascual-García (ETH Zurich)H-Index: 10
#2Miguel Arenas (University of Vigo)H-Index: 22
Last. Ugo Bastolla (CSIC: Spanish National Research Council)H-Index: 35
view all 3 authors...
: The molecular clock hypothesis, which states that substitutions accumulate in protein sequences at a constant rate, plays a fundamental role in molecular evolution but it is violated when selective or mutational processes vary with time. Such violations of the molecular clock have been widely investigated for protein sequences, but not yet for protein structures. Here, we introduce a novel statistical test (Significant Clock Violations) and perform a large scale assessment of the molecular clo...
Source
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.