DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Deep clustering of protein folding simulations

Abstract

Background: We investigate the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.Results: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that sharemore » similar structural features.Conclusions: Together, we demonstrate that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.« less

Authors:
ORCiD logo [1];  [1]; ORCiD logo [1]; ORCiD logo [2]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Argonne National Lab. (ANL), Lemont, IL (United States)
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1513368
Grant/Contract Number:  
AC02-06-CH11357; AC52-07NA27344; AC5206NA25396; AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 19; Journal Issue: S18; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; Deep learning; Variational autoencoder; Protein folding; Conformational substates

Citation Formats

Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., and Ramanathan, Arvind. Deep clustering of protein folding simulations. United States: N. p., 2018. Web. doi:10.1186/s12859-018-2507-5.
Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., & Ramanathan, Arvind. Deep clustering of protein folding simulations. United States. https://doi.org/10.1186/s12859-018-2507-5
Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., and Ramanathan, Arvind. Fri . "Deep clustering of protein folding simulations". United States. https://doi.org/10.1186/s12859-018-2507-5. https://www.osti.gov/servlets/purl/1513368.
@article{osti_1513368,
title = {Deep clustering of protein folding simulations},
author = {Bhowmik, Debsindhu and Gao, Shang and Young, Michael T. and Ramanathan, Arvind},
abstractNote = {Background: We investigate the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.Results: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.Conclusions: Together, we demonstrate that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.},
doi = {10.1186/s12859-018-2507-5},
journal = {BMC Bioinformatics},
number = S18,
volume = 19,
place = {United States},
year = {Fri Dec 21 00:00:00 EST 2018},
month = {Fri Dec 21 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 49 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Convolutional variational autoencoder architecture. The deep learning network processes MD simulation data into contact maps (2D images) that are then successively fed into 4 convolutional layers. The outputs from the final convolutional layer is then fed into a fully connected (dense) layer. This is then used to buildmore » the latent space in three dimensions, the output of which is the learned VAE embedding. In order to reconstruct the contact maps, we then use 4 successive de-convolutional layers, symmetric to the 4 input convolutional layers« less

Save / Share:

Works referenced in this record:

Principal Component Analysis for Protein Folding Dynamics
journal, January 2009

  • Maisuradze, Gia G.; Liwo, Adam; Scheraga, Harold A.
  • Journal of Molecular Biology, Vol. 385, Issue 1
  • DOI: 10.1016/j.jmb.2008.10.018

Discovering Conformational Sub-States Relevant to Protein Function
journal, January 2011


Protein Conformational Populations and Functionally Relevant Substates
journal, August 2013

  • Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
  • Accounts of Chemical Research, Vol. 47, Issue 1
  • DOI: 10.1021/ar400084s

Evaluation of Dimensionality-Reduction Methods from Peptide Folding–Unfolding Simulations
journal, April 2013

  • Duan, Mojie; Fan, Jue; Li, Minghai
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 5
  • DOI: 10.1021/ct400052y

Enhanced Dynamics of Hydrated tRNA on Nanodiamond Surfaces: A Combined Neutron Scattering and MD Simulation Study
journal, September 2016

  • Dhindsa, Gurpreet K.; Bhowmik, Debsindhu; Goswami, Monojoy
  • The Journal of Physical Chemistry B, Vol. 120, Issue 38
  • DOI: 10.1021/acs.jpcb.6b07511

Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins
journal, June 2014

  • Savol, Andrej J.; Chennubhotla, Chakra S.
  • Journal of Chemical Theory and Computation, Vol. 10, Issue 8
  • DOI: 10.1021/ct500361w

Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction
journal, June 2006

  • Das, P.; Moll, M.; Stamati, H.
  • Proceedings of the National Academy of Sciences, Vol. 103, Issue 26
  • DOI: 10.1073/pnas.0603553103

Discovery Through the Computational Microscope
journal, October 2009


How Fast-Folding Proteins Fold
journal, October 2011


Molecular dynamics simulations of protein folding from the transition state
journal, April 2002

  • Gsponer, J.; Caflisch, A.
  • Proceedings of the National Academy of Sciences, Vol. 99, Issue 10
  • DOI: 10.1073/pnas.092686399

Molecular Dynamics:  Survey of Methods for Simulating the Activity of Proteins
journal, May 2006

  • Adcock, Stewart A.; McCammon, J. Andrew
  • Chemical Reviews, Vol. 106, Issue 5
  • DOI: 10.1021/cr040426m

Efficient Global Optimization of Expensive Black-Box Functions
journal, January 1998

  • Jones, Donald R.; Schonlau, Matthias; Welch, William J.
  • Journal of Global Optimization, Vol. 13, Issue 4, p. 455-492
  • DOI: 10.1023/A:1008306431147

Systematic Validation of Protein Force Fields against Experimental Data
journal, February 2012


An automated analysis workflow for optimization of force-field parameters using neutron scattering data
journal, July 2017

  • Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu
  • Journal of Computational Physics, Vol. 340
  • DOI: 10.1016/j.jcp.2017.03.045

Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: Application to enzyme adenylate kinase
journal, August 2012

  • Ramanathan, Arvind; Savol, Andrej J.; Agarwal, Pratul K.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 80, Issue 11
  • DOI: 10.1002/prot.24135

MDAnalysis: A toolkit for the analysis of molecular dynamics simulations
journal, April 2011

  • Michaud-Agrawal, Naveen; Denning, Elizabeth J.; Woolf, Thomas B.
  • Journal of Computational Chemistry, Vol. 32, Issue 10
  • DOI: 10.1002/jcc.21787

Systematic characterization of protein folding pathways using diffusion maps: Application to Trp-cage miniprotein
journal, February 2015

  • Kim, Sang Beom; Dsilva, Carmeline J.; Kevrekidis, Ioannis G.
  • The Journal of Chemical Physics, Vol. 142, Issue 8
  • DOI: 10.1063/1.4913322

MSMBuilder2: Modeling Conformational Dynamics on the Picosecond to Millisecond Scale
journal, August 2011

  • Beauchamp, Kyle A.; Bowman, Gregory R.; Lane, Thomas J.
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 10
  • DOI: 10.1021/ct200463m

Biomolecular Simulation: A Computational Microscope for Molecular Biology
journal, June 2012


Sub-microsecond Protein Folding
journal, June 2006

  • Kubelka, Jan; Chiu, Thang K.; Davies, David R.
  • Journal of Molecular Biology, Vol. 359, Issue 3
  • DOI: 10.1016/j.jmb.2006.03.034

Progress and challenges in the automated construction of Markov state models for full protein systems
journal, September 2009

  • Bowman, Gregory R.; Beauchamp, Kyle A.; Boxer, George
  • The Journal of Chemical Physics, Vol. 131, Issue 12
  • DOI: 10.1063/1.3216567

Deep Learning in Drug Discovery
journal, December 2015

  • Gawehn, Erik; Hiss, Jan A.; Schneider, Gisbert
  • Molecular Informatics, Vol. 35, Issue 1
  • DOI: 10.1002/minf.201501008

The ββα fold: explorations in sequence space11Edited by M. F. Summers
journal, April 2001

  • Sarisky, Catherine A.; Mayo, Stephen L.
  • Journal of Molecular Biology, Vol. 307, Issue 5
  • DOI: 10.1006/jmbi.2000.4345

Computational ‘microscopy’ of cellular membranes
journal, January 2016

  • Ingólfsson, Helgi I.; Arnarez, Clément; Periole, Xavier
  • Journal of Cell Science, Vol. 129, Issue 2
  • DOI: 10.1242/jcs.176040

Recovery of protein structure from contact maps
journal, October 1997


Anton, a special-purpose machine for molecular dynamics simulation
journal, July 2008

  • Shaw, David E.; Chao, Jack C.; Eastwood, Michael P.
  • Communications of the ACM, Vol. 51, Issue 7
  • DOI: 10.1145/1364782.1364802

On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations
journal, February 2011

  • Ramanathan, Arvind; Yoo, Ji Oh; Langmead, Christopher J.
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 3
  • DOI: 10.1021/ct100531j

MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations
conference, January 2016


Improved side-chain torsion potentials for the Amber ff99SB protein force field
journal, January 2010

  • Lindorff-Larsen, Kresten; Piana, Stefano; Palmo, Kim
  • Proteins: Structure, Function, and Bioinformatics
  • DOI: 10.1002/prot.22711

Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: Application to enzyme adenylate kinase
journal, August 2012

  • Ramanathan, Arvind; Savol, Andrej J.; Agarwal, Pratul K.
  • Proteins: Structure, Function, and Bioinformatics, Vol. 80, Issue 11
  • DOI: 10.1002/prot.24135

The ββα fold: explorations in sequence space11Edited by M. F. Summers
journal, April 2001

  • Sarisky, Catherine A.; Mayo, Stephen L.
  • Journal of Molecular Biology, Vol. 307, Issue 5
  • DOI: 10.1006/jmbi.2000.4345

Sub-microsecond Protein Folding
journal, June 2006

  • Kubelka, Jan; Chiu, Thang K.; Davies, David R.
  • Journal of Molecular Biology, Vol. 359, Issue 3
  • DOI: 10.1016/j.jmb.2006.03.034

Principal Component Analysis for Protein Folding Dynamics
journal, January 2009

  • Maisuradze, Gia G.; Liwo, Adam; Scheraga, Harold A.
  • Journal of Molecular Biology, Vol. 385, Issue 1
  • DOI: 10.1016/j.jmb.2008.10.018

Discovery Through the Computational Microscope
journal, October 2009


Protein folding in contact map space
journal, December 2000


Enhanced Dynamics of Hydrated tRNA on Nanodiamond Surfaces: A Combined Neutron Scattering and MD Simulation Study
journal, September 2016

  • Dhindsa, Gurpreet K.; Bhowmik, Debsindhu; Goswami, Monojoy
  • The Journal of Physical Chemistry B, Vol. 120, Issue 38
  • DOI: 10.1021/acs.jpcb.6b07511

Protein Conformational Populations and Functionally Relevant Substates
journal, August 2013

  • Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
  • Accounts of Chemical Research, Vol. 47, Issue 1
  • DOI: 10.1021/ar400084s

Molecular Dynamics:  Survey of Methods for Simulating the Activity of Proteins
journal, May 2006

  • Adcock, Stewart A.; McCammon, J. Andrew
  • Chemical Reviews, Vol. 106, Issue 5
  • DOI: 10.1021/cr040426m

On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations
journal, February 2011

  • Ramanathan, Arvind; Yoo, Ji Oh; Langmead, Christopher J.
  • Journal of Chemical Theory and Computation, Vol. 7, Issue 3
  • DOI: 10.1021/ct100531j

Evaluation of Dimensionality-Reduction Methods from Peptide Folding–Unfolding Simulations
journal, April 2013

  • Duan, Mojie; Fan, Jue; Li, Minghai
  • Journal of Chemical Theory and Computation, Vol. 9, Issue 5
  • DOI: 10.1021/ct400052y

Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins
journal, June 2014

  • Savol, Andrej J.; Chennubhotla, Chakra S.
  • Journal of Chemical Theory and Computation, Vol. 10, Issue 8
  • DOI: 10.1021/ct500361w

Bat coronaviruses related to SARS-CoV-2 and infectious for human cells
journal, February 2022


Progress and challenges in the automated construction of Markov state models for full protein systems
journal, September 2009

  • Bowman, Gregory R.; Beauchamp, Kyle A.; Boxer, George
  • The Journal of Chemical Physics, Vol. 131, Issue 12
  • DOI: 10.1063/1.3216567

Systematic characterization of protein folding pathways using diffusion maps: Application to Trp-cage miniprotein
journal, February 2015

  • Kim, Sang Beom; Dsilva, Carmeline J.; Kevrekidis, Ioannis G.
  • The Journal of Chemical Physics, Vol. 142, Issue 8
  • DOI: 10.1063/1.4913322

Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction
journal, June 2006

  • Das, P.; Moll, M.; Stamati, H.
  • Proceedings of the National Academy of Sciences, Vol. 103, Issue 26
  • DOI: 10.1073/pnas.0603553103

Molecular dynamics simulations of protein folding from the transition state
journal, April 2002

  • Gsponer, J.; Caflisch, A.
  • Proceedings of the National Academy of Sciences, Vol. 99, Issue 10
  • DOI: 10.1073/pnas.092686399

How Fast-Folding Proteins Fold
journal, October 2011


Anton, a special-purpose machine for molecular dynamics simulation
journal, July 2008

  • Shaw, David E.; Chao, Jack C.; Eastwood, Michael P.
  • Communications of the ACM, Vol. 51, Issue 7
  • DOI: 10.1145/1364782.1364802

Discovering Conformational Sub-States Relevant to Protein Function
journal, January 2011


Systematic Validation of Protein Force Fields against Experimental Data
journal, February 2012


Tutorial on Variational Autoencoders
preprint, January 2016


Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
preprint, January 2017


Dimensionality reduction methods for molecular simulations
preprint, January 2017


Recovery of Protein Structure from Contact Maps
preprint, January 1997


Works referencing / citing this record:

Mechanism of glucocerebrosidase activation and dysfunction in Gaucher disease unraveled by molecular dynamics and deep learning
journal, February 2019

  • Romero, Raquel; Ramanathan, Arvind; Yuen, Tony
  • Proceedings of the National Academy of Sciences, Vol. 116, Issue 11
  • DOI: 10.1073/pnas.1818411116

IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
conference, October 2021

  • Saadi, Aymen Al; Alfe, Dario; Babuji, Yadu
  • ICPP 2021: 50th International Conference on Parallel Processing
  • DOI: 10.1145/3472456.3473524

Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.