skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multiresolution persistent homology for excessively large biomolecular datasets

Abstract

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications inmore » arbitrary data sets, such as social networks, biological networks, and graphs.« less

Authors:
;  [1];  [1]
  1. Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States)
Publication Date:
OSTI Identifier:
22489667
Resource Type:
Journal Article
Journal Name:
Journal of Chemical Physics
Additional Journal Information:
Journal Volume: 143; Journal Issue: 13; Other Information: (c) 2015 AIP Publishing LLC; Country of input: International Atomic Energy Agency (IAEA); Journal ID: ISSN 0021-9606
Country of Publication:
United States
Language:
English
Subject:
37 INORGANIC, ORGANIC, PHYSICAL AND ANALYTICAL CHEMISTRY; ATOMS; DATASETS; DENSITY; DNA; MOLECULES; PROTEINS

Citation Formats

Xia, Kelin, Zhao, Zhixiong, Wei, Guo-Wei, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824. Multiresolution persistent homology for excessively large biomolecular datasets. United States: N. p., 2015. Web. doi:10.1063/1.4931733.
Xia, Kelin, Zhao, Zhixiong, Wei, Guo-Wei, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, & Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824. Multiresolution persistent homology for excessively large biomolecular datasets. United States. https://doi.org/10.1063/1.4931733
Xia, Kelin, Zhao, Zhixiong, Wei, Guo-Wei, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824. Wed . "Multiresolution persistent homology for excessively large biomolecular datasets". United States. https://doi.org/10.1063/1.4931733.
@article{osti_22489667,
title = {Multiresolution persistent homology for excessively large biomolecular datasets},
author = {Xia, Kelin and Zhao, Zhixiong and Wei, Guo-Wei and Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824},
abstractNote = {Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.},
doi = {10.1063/1.4931733},
url = {https://www.osti.gov/biblio/22489667}, journal = {Journal of Chemical Physics},
issn = {0021-9606},
number = 13,
volume = 143,
place = {United States},
year = {2015},
month = {10}
}