DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Protein structure determination using metagenome sequence data

Journal Article · · Science
 [1];  [2];  [3];  [2];  [3];  [4];  [5];  [6];  [7]
  1. Department of Biochemistry, University of Washington, Seattle, WA 98105, USA., Institute for Protein Design, University of Washington, Seattle, WA 98105, USA., Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA.
  2. Department of Biochemistry, University of Washington, Seattle, WA 98105, USA., Institute for Protein Design, University of Washington, Seattle, WA 98105, USA.
  3. Joint Genome Institute, Walnut Creek, CA 94598, USA.
  4. Department of Biochemistry, University of Washington, Seattle, WA 98105, USA., Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98105, USA.
  5. Facebook Inc., Seattle, WA 98109, USA.
  6. Joint Genome Institute, Walnut Creek, CA 94598, USA., Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia.
  7. Department of Biochemistry, University of Washington, Seattle, WA 98105, USA., Institute for Protein Design, University of Washington, Seattle, WA 98105, USA., Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98105, USA.

Filling in the protein fold picture Fewer than a third of the 14,849 known protein families have at least one member with an experimentally determined structure. This leaves more than 5000 protein families with no structural information. Protein modeling using residue-residue contacts inferred from evolutionary data has been successful in modeling unknown structures, but it requires large numbers of aligned sequences. Ovchinnikov et al. augmented such sequence alignments with metagenome sequence data (see the Perspective by Söding). They determined the number of sequences required to allow modeling, developed criteria for model quality, and, where possible, improved modeling by matching predicted contacts to known structures. Their method predicted quality structural models for 614 protein families, of which about 140 represent newly discovered protein folds. Science , this issue p. 294 ; see also p. 248

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
National Institutes of Health (NIH); USDOE; USDOE Office of Science (SC), Biological and Environmental Research (BER) (SC-23)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1436529
Journal Information:
Science, Journal Name: Science Journal Issue: 6322 Vol. 355; ISSN 0036-8075
Publisher:
American Association for the Advancement of Science (AAAS)Copyright Statement
Country of Publication:
United States
Language:
English

References (53)

New optimization method for conformational energy calculations on polypeptides: Conformational space annealing journal July 1997
Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins journal January 1999
Scoring function for automated assessment of protein structure template quality journal January 2004
Learning generative models for protein fold families
  • Balakrishnan, Sivaraman; Kamisetty, Hetunandan; Carbonell, Jaime G.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 79, Issue 4 https://doi.org/10.1002/prot.22934
journal January 2011
One contact for every twelve residues allows robust and accurate topology-level protein structure modeling: Contact Guided Protein Structure Prediction journal September 2013
Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta: Structure Prediction using Coevolution journal February 2016
A general method applicable to the search for similarities in the amino acid sequence of two proteins journal March 1970
Identification of common molecular subsequences journal March 1981
Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing journal June 2012
High-Resolution Comparative Modeling with RosettaCM journal October 2013
The Origin of Consistent Protein Structure Refinement from Structural Averaging journal June 2015
Structure of a Functional Amyloid Protein Subunit Computed Using Sequence Variation journal December 2014
Crystal structures of a double-barrelled fluoride ion channel journal September 2015
Structural basis for amino acid export by DMT superfamily transporter YddG journal May 2016
Protein structure prediction from sequence variation journal November 2012
Crystal structure of E. coli lipoprotein diacylglyceryl transferase journal January 2016
Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors journal January 2015
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment journal December 2011
Improved low-resolution crystallographic refinement with Phenix and Rosetta journal September 2013
Direct-coupling analysis of residue coevolution captures native contacts across many protein families journal November 2011
Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis journal May 2012
Genomics-aided structure prediction journal June 2012
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era journal September 2013
All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences journal April 2015
Crystal structure of an Fe-S cluster-containing fumarate hydratase enzyme from Leishmania major reveals a unique protein fold journal August 2016
Computation and Functional Studies Provide a Model for the Structure of the Zinc Transporter hZIP4 journal May 2015
PISCES: a protein sequence culling server journal August 2003
Protein homology detection by HMM-HMM comparison journal November 2004
Fast overlapping of protein contact maps by alignment of eigenvectors journal July 2010
PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments journal November 2011
Discriminative modelling of context-specific amino acid substitution probabilities journal October 2012
GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity journal January 2014
CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations journal July 2014
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches journal November 2014
3Dmol.js: molecular visualization with WebGL journal December 2014
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements journal July 2001
Data growth and its impact on the SCOP database: new developments journal December 2007
IMG/M 4 version of the integrated metagenome comparative analysis system journal October 2013
The Pfam protein families database: towards a more sustainable future journal December 2015
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models journal January 2013
D³ Data-Driven Documents journal December 2011
Protein structure comparison using iterated double dynamic programming journal January 1999
Structural basis of lipoprotein signal peptidase II action and inhibition by the antibiotic globomycin journal February 2016
Structure of a bd oxidase indicates similar mechanisms for membrane-integrated oxygen reductases journal April 2016
A Bioinformatician's Guide to Metagenomics journal December 2008
A simple and fast heuristic for protein structure comparison journal March 2008
MRFalign: Protein Homology Detection through Alignment of Markov Random Fields journal March 2014
Protein 3D Structure Computed from Evolutionary Sequence Variation journal December 2011
The Protein Structure Initiative: achievements and visions for the future journal April 2012
Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information journal May 2014
A structural model of the active ribosome-bound membrane protein insertase YidC journal July 2014
Sequence co-evolution gives 3D contacts and structures of protein complexes journal September 2014
Large-scale determination of previously unsolved protein structures using evolutionary information journal September 2015