Deep clustering of protein folding simulations
Abstract
Background: We investigate the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.Results: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that sharemore »
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Argonne National Lab. (ANL), Lemont, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1513368
- Grant/Contract Number:
- AC02-06-CH11357; AC52-07NA27344; AC5206NA25396; AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- BMC Bioinformatics
- Additional Journal Information:
- Journal Volume: 19; Journal Issue: S18; Journal ID: ISSN 1471-2105
- Publisher:
- BioMed Central
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 59 BASIC BIOLOGICAL SCIENCES; Deep learning; Variational autoencoder; Protein folding; Conformational substates
Citation Formats
Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., and Ramanathan, Arvind. Deep clustering of protein folding simulations. United States: N. p., 2018.
Web. doi:10.1186/s12859-018-2507-5.
Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., & Ramanathan, Arvind. Deep clustering of protein folding simulations. United States. https://doi.org/10.1186/s12859-018-2507-5
Bhowmik, Debsindhu, Gao, Shang, Young, Michael T., and Ramanathan, Arvind. Fri .
"Deep clustering of protein folding simulations". United States. https://doi.org/10.1186/s12859-018-2507-5. https://www.osti.gov/servlets/purl/1513368.
@article{osti_1513368,
title = {Deep clustering of protein folding simulations},
author = {Bhowmik, Debsindhu and Gao, Shang and Young, Michael T. and Ramanathan, Arvind},
abstractNote = {Background: We investigate the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes.Results: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features.Conclusions: Together, we demonstrate that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.},
doi = {10.1186/s12859-018-2507-5},
journal = {BMC Bioinformatics},
number = S18,
volume = 19,
place = {United States},
year = {Fri Dec 21 00:00:00 EST 2018},
month = {Fri Dec 21 00:00:00 EST 2018}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Principal Component Analysis for Protein Folding Dynamics
journal, January 2009
- Maisuradze, Gia G.; Liwo, Adam; Scheraga, Harold A.
- Journal of Molecular Biology, Vol. 385, Issue 1
Discovering Conformational Sub-States Relevant to Protein Function
journal, January 2011
- Ramanathan, Arvind; Savol, Andrej J.; Langmead, Christopher J.
- PLoS ONE, Vol. 6, Issue 1
Protein Conformational Populations and Functionally Relevant Substates
journal, August 2013
- Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
- Accounts of Chemical Research, Vol. 47, Issue 1
Evaluation of Dimensionality-Reduction Methods from Peptide Folding–Unfolding Simulations
journal, April 2013
- Duan, Mojie; Fan, Jue; Li, Minghai
- Journal of Chemical Theory and Computation, Vol. 9, Issue 5
Enhanced Dynamics of Hydrated tRNA on Nanodiamond Surfaces: A Combined Neutron Scattering and MD Simulation Study
journal, September 2016
- Dhindsa, Gurpreet K.; Bhowmik, Debsindhu; Goswami, Monojoy
- The Journal of Physical Chemistry B, Vol. 120, Issue 38
Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins
journal, June 2014
- Savol, Andrej J.; Chennubhotla, Chakra S.
- Journal of Chemical Theory and Computation, Vol. 10, Issue 8
Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction
journal, June 2006
- Das, P.; Moll, M.; Stamati, H.
- Proceedings of the National Academy of Sciences, Vol. 103, Issue 26
Discovery Through the Computational Microscope
journal, October 2009
- Lee, Eric H.; Hsin, Jen; Sotomayor, Marcos
- Structure, Vol. 17, Issue 10
How Fast-Folding Proteins Fold
journal, October 2011
- Lindorff-Larsen, K.; Piana, S.; Dror, R. O.
- Science, Vol. 334, Issue 6055
Molecular dynamics simulations of protein folding from the transition state
journal, April 2002
- Gsponer, J.; Caflisch, A.
- Proceedings of the National Academy of Sciences, Vol. 99, Issue 10
Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins
journal, May 2006
- Adcock, Stewart A.; McCammon, J. Andrew
- Chemical Reviews, Vol. 106, Issue 5
Efficient Global Optimization of Expensive Black-Box Functions
journal, January 1998
- Jones, Donald R.; Schonlau, Matthias; Welch, William J.
- Journal of Global Optimization, Vol. 13, Issue 4, p. 455-492
Systematic Validation of Protein Force Fields against Experimental Data
journal, February 2012
- Lindorff-Larsen, Kresten; Maragakis, Paul; Piana, Stefano
- PLoS ONE, Vol. 7, Issue 2
An automated analysis workflow for optimization of force-field parameters using neutron scattering data
journal, July 2017
- Lynch, Vickie E.; Borreguero, Jose M.; Bhowmik, Debsindhu
- Journal of Computational Physics, Vol. 340
Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: Application to enzyme adenylate kinase
journal, August 2012
- Ramanathan, Arvind; Savol, Andrej J.; Agarwal, Pratul K.
- Proteins: Structure, Function, and Bioinformatics, Vol. 80, Issue 11
MDAnalysis: A toolkit for the analysis of molecular dynamics simulations
journal, April 2011
- Michaud-Agrawal, Naveen; Denning, Elizabeth J.; Woolf, Thomas B.
- Journal of Computational Chemistry, Vol. 32, Issue 10
Systematic characterization of protein folding pathways using diffusion maps: Application to Trp-cage miniprotein
journal, February 2015
- Kim, Sang Beom; Dsilva, Carmeline J.; Kevrekidis, Ioannis G.
- The Journal of Chemical Physics, Vol. 142, Issue 8
MSMBuilder2: Modeling Conformational Dynamics on the Picosecond to Millisecond Scale
journal, August 2011
- Beauchamp, Kyle A.; Bowman, Gregory R.; Lane, Thomas J.
- Journal of Chemical Theory and Computation, Vol. 7, Issue 10
Biomolecular Simulation: A Computational Microscope for Molecular Biology
journal, June 2012
- Dror, Ron O.; Dirks, Robert M.; Grossman, J. P.
- Annual Review of Biophysics, Vol. 41, Issue 1
Sub-microsecond Protein Folding
journal, June 2006
- Kubelka, Jan; Chiu, Thang K.; Davies, David R.
- Journal of Molecular Biology, Vol. 359, Issue 3
Progress and challenges in the automated construction of Markov state models for full protein systems
journal, September 2009
- Bowman, Gregory R.; Beauchamp, Kyle A.; Boxer, George
- The Journal of Chemical Physics, Vol. 131, Issue 12
Deep Learning in Drug Discovery
journal, December 2015
- Gawehn, Erik; Hiss, Jan A.; Schneider, Gisbert
- Molecular Informatics, Vol. 35, Issue 1
The ββα fold: explorations in sequence space11Edited by M. F. Summers
journal, April 2001
- Sarisky, Catherine A.; Mayo, Stephen L.
- Journal of Molecular Biology, Vol. 307, Issue 5
Computational ‘microscopy’ of cellular membranes
journal, January 2016
- Ingólfsson, Helgi I.; Arnarez, Clément; Periole, Xavier
- Journal of Cell Science, Vol. 129, Issue 2
Recovery of protein structure from contact maps
journal, October 1997
- Vendruscolo, Michele; Kussell, Edo; Domany, Eytan
- Folding and Design, Vol. 2, Issue 5
Anton, a special-purpose machine for molecular dynamics simulation
journal, July 2008
- Shaw, David E.; Chao, Jack C.; Eastwood, Michael P.
- Communications of the ACM, Vol. 51, Issue 7
On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations
journal, February 2011
- Ramanathan, Arvind; Yoo, Ji Oh; Langmead, Christopher J.
- Journal of Chemical Theory and Computation, Vol. 7, Issue 3
MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations
conference, January 2016
- Gowers, Richard; Linke, Max; Barnoud, Jonathan
- Proceedings of the Python in Science Conference
Improved side-chain torsion potentials for the Amber ff99SB protein force field
journal, January 2010
- Lindorff-Larsen, Kresten; Piana, Stefano; Palmo, Kim
- Proteins: Structure, Function, and Bioinformatics
Event detection and sub-state discovery from biomolecular simulations using higher-order statistics: Application to enzyme adenylate kinase
journal, August 2012
- Ramanathan, Arvind; Savol, Andrej J.; Agarwal, Pratul K.
- Proteins: Structure, Function, and Bioinformatics, Vol. 80, Issue 11
The ββα fold: explorations in sequence space11Edited by M. F. Summers
journal, April 2001
- Sarisky, Catherine A.; Mayo, Stephen L.
- Journal of Molecular Biology, Vol. 307, Issue 5
Sub-microsecond Protein Folding
journal, June 2006
- Kubelka, Jan; Chiu, Thang K.; Davies, David R.
- Journal of Molecular Biology, Vol. 359, Issue 3
Principal Component Analysis for Protein Folding Dynamics
journal, January 2009
- Maisuradze, Gia G.; Liwo, Adam; Scheraga, Harold A.
- Journal of Molecular Biology, Vol. 385, Issue 1
Discovery Through the Computational Microscope
journal, October 2009
- Lee, Eric H.; Hsin, Jen; Sotomayor, Marcos
- Structure, Vol. 17, Issue 10
Protein folding in contact map space
journal, December 2000
- Domany, Eytan
- Physica A: Statistical Mechanics and its Applications, Vol. 288, Issue 1-4
Enhanced Dynamics of Hydrated tRNA on Nanodiamond Surfaces: A Combined Neutron Scattering and MD Simulation Study
journal, September 2016
- Dhindsa, Gurpreet K.; Bhowmik, Debsindhu; Goswami, Monojoy
- The Journal of Physical Chemistry B, Vol. 120, Issue 38
Protein Conformational Populations and Functionally Relevant Substates
journal, August 2013
- Ramanathan, Arvind; Savol, Andrej; Burger, Virginia
- Accounts of Chemical Research, Vol. 47, Issue 1
Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins
journal, May 2006
- Adcock, Stewart A.; McCammon, J. Andrew
- Chemical Reviews, Vol. 106, Issue 5
On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations
journal, February 2011
- Ramanathan, Arvind; Yoo, Ji Oh; Langmead, Christopher J.
- Journal of Chemical Theory and Computation, Vol. 7, Issue 3
Evaluation of Dimensionality-Reduction Methods from Peptide Folding–Unfolding Simulations
journal, April 2013
- Duan, Mojie; Fan, Jue; Li, Minghai
- Journal of Chemical Theory and Computation, Vol. 9, Issue 5
Quantifying the Sources of Kinetic Frustration in Folding Simulations of Small Proteins
journal, June 2014
- Savol, Andrej J.; Chennubhotla, Chakra S.
- Journal of Chemical Theory and Computation, Vol. 10, Issue 8
Bat coronaviruses related to SARS-CoV-2 and infectious for human cells
journal, February 2022
- Temmam, Sarah; Vongphayloth, Khamsing; Baquero, Eduard
- Nature, Vol. 604, Issue 7905
Progress and challenges in the automated construction of Markov state models for full protein systems
journal, September 2009
- Bowman, Gregory R.; Beauchamp, Kyle A.; Boxer, George
- The Journal of Chemical Physics, Vol. 131, Issue 12
Systematic characterization of protein folding pathways using diffusion maps: Application to Trp-cage miniprotein
journal, February 2015
- Kim, Sang Beom; Dsilva, Carmeline J.; Kevrekidis, Ioannis G.
- The Journal of Chemical Physics, Vol. 142, Issue 8
Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction
journal, June 2006
- Das, P.; Moll, M.; Stamati, H.
- Proceedings of the National Academy of Sciences, Vol. 103, Issue 26
Molecular dynamics simulations of protein folding from the transition state
journal, April 2002
- Gsponer, J.; Caflisch, A.
- Proceedings of the National Academy of Sciences, Vol. 99, Issue 10
How Fast-Folding Proteins Fold
journal, October 2011
- Lindorff-Larsen, K.; Piana, S.; Dror, R. O.
- Science, Vol. 334, Issue 6055
Anton, a special-purpose machine for molecular dynamics simulation
journal, July 2008
- Shaw, David E.; Chao, Jack C.; Eastwood, Michael P.
- Communications of the ACM, Vol. 51, Issue 7
Discovering Conformational Sub-States Relevant to Protein Function
journal, January 2011
- Ramanathan, Arvind; Savol, Andrej J.; Langmead, Christopher J.
- PLoS ONE, Vol. 6, Issue 1
Systematic Validation of Protein Force Fields against Experimental Data
journal, February 2012
- Lindorff-Larsen, Kresten; Maragakis, Paul; Piana, Stefano
- PLoS ONE, Vol. 7, Issue 2
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
preprint, January 2017
- Gomes, Joseph; Ramsundar, Bharath; Feinberg, Evan N.
- arXiv
Dimensionality reduction methods for molecular simulations
preprint, January 2017
- Doerr, Stefan; Ariz-Extreme, Igor; Harvey, Matthew J.
- arXiv
Recovery of Protein Structure from Contact Maps
preprint, January 1997
- Vendruscolo, Michele; Kussell, Edo; Domany, Eytan
- arXiv
Works referencing / citing this record:
Mechanism of glucocerebrosidase activation and dysfunction in Gaucher disease unraveled by molecular dynamics and deep learning
journal, February 2019
- Romero, Raquel; Ramanathan, Arvind; Yuen, Tony
- Proceedings of the National Academy of Sciences, Vol. 116, Issue 11
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
conference, October 2021
- Saadi, Aymen Al; Alfe, Dario; Babuji, Yadu
- ICPP 2021: 50th International Conference on Parallel Processing
Figures / Tables found in this record: