Data Intensive Analysis of Biomolecular Simulations

Straatsma, TP; Soares, Thereza A

doi:10.1063/1.2836009

Title: Data Intensive Analysis of Biomolecular Simulations

Conference · Sat Dec 01 00:00:00 EST 2007

DOI:https://doi.org/10.1063/1.2836009· OSTI ID:962048

Straatsma, TP; Soares, Thereza A

The advances in biomolecular modeling and simulation made possible by the availability of increasingly powerful high performance computing resources is extending molecular simulations to biological more relevant system size and time scales. At the same time, advances in simulation methodologies are allowing more complex processes to be described more accurately. These developments make a systems approach to computational structural biology feasible, but this will require a focused emphasis on the comparative analysis of the increasing number of molecular simulations that are being carried out for biomolecular systems with more realistic models, multi-component environments, and for longer simulation times. Just as in the case of the analysis of the large data sources created by the new high-throughput experimental technologies, biomolecular computer simulations contribute to the progress in biology through comparative analysis. The continuing increase in available protein structures allows the comparative analysis of the role of structure and conformational flexibility in protein function, and is the foundation of the discipline of structural bioinformatics. This creates the opportunity to derive general findings from the comparative analysis of molecular dynamics simulations of a wide range of proteins, protein-protein complexes and other complex biological systems. Because of the importance of protein conformational dynamics for protein function, it is essential that the analysis of molecular trajectories is carried out using a novel, more integrative and systematic approach. We are developing a much needed rigorous computer science based framework for the efficient analysis of the increasingly large data sets resulting from molecular simulations. Such a suite of capabilities will also provide the required tools for access and analysis of a distributed library of generated trajectories. Our research is focusing on the following areas: (1) the development of an efficient analysis framework for very large scale trajectories on massively parallel architectures, (2) the development of novel methodologies that allow automated detection of events in these very large data sets, and (3) the efficient comparative analysis of multiple trajectories. The goal of the presented work is the development of new algorithms that will allow biomolecular simulation studies to become an integral tool to address the challenges of post-genomic biological research. The strategy to deliver the required data intensive computing applications that can effectively deal with the volume of simulation data that will become available is based on taking advantage of the capabilities offered by the use of large globally addressable memory architectures. The first requirement is the design of a flexible underlying data structure for single large trajectories that will form an adaptable framework for a wide range of analysis capabilities. The typical approach to trajectory analysis is to sequentially process trajectories time frame by time frame. This is the implementation found in molecular simulation codes such as NWChem, and has been designed in this way to be able to run on workstation computers and other architectures with an aggregate amount of memory that would not allow entire trajectories to be held in core. The consequence of this approach is an I/O dominated solution that scales very poorly on parallel machines. We are currently using an approach of developing tools specifically intended for use on large scale machines with sufficient main memory that entire trajectories can be held in core. This greatly reduces the cost of I/O as trajectories are read only once during the analysis. In our current Data Intensive Analysis (DIANA) implementation, each processor determines and skips to the entry within the trajectory that typically will be available in multiple files and independently from all other processors read the appropriate frames.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 962048

Report Number(s):: PNNL-SA-55419; KJ0101030; TRN: US200919%%361

Resource Relation:: Conference: COMPUTATION IN MODERN SCIENCE AND ENGINEERING: Proceedings of the International Conference on Computational Methods in Science and Engineering (ICCMSE 2007). AIP Conference Proceedings, 963:1379-1382

Country of Publication:: United States

Language:: English

Similar Records

Data Intensive Analysis of Biomolecular Simulations

Conference · Sat Mar 01 00:00:00 EST 2008 · OSTI ID:962048

Straatsma, TP

Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond

Technical Report · Fri Jun 01 00:00:00 EDT 2007 · OSTI ID:962048

Oehmen, Christopher S; Sofia, Heidi J; Baxter, Douglas; +5 more

Breaking the High-Throughput Bottleneck: New tools help biologists integrate complex datasets

Journal Article · Wed Mar 01 00:00:00 EST 2006 · Scientific Computing, 23(4):22-26 · OSTI ID:962048

Waters, Katrina M; Singhal, Mudita; Webb-Robertson, Bobbie-Jo M; +2 more

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
ALGORITHMS
AVAILABILITY
BIOLOGY
COMPUTERIZED SIMULATION
COMPUTERS
DESIGN
DETECTION
FLEXIBILITY
FOCUSING
IMPLEMENTATION
PERFORMANCE
PROTEIN STRUCTURE
PROTEINS
SIMULATION
TRAJECTORIES

Title: Data Intensive Analysis of Biomolecular Simulations

Citation Formats

Similar Records

Related Subjects