skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The smallest cells pose the biggest problem : high performance computing and the analysis of metagenome sequence data.

Conference · · J. Phys.: Conf. Ser.
 [1]
  1. Mathematics and Computer Science

New high-throughput DNA sequencing technologies have revolutionized how scientists study the organisms around us. In particular, microbiology - the study of the smallest, unseen organisms that pervade our lives - has embraced these new techniques to characterize and analyze the cellular constituents and use this information to develop novel tools, techniques, and therapeutics. So-called next-generation DNA sequencing platforms have resulted in huge increases in the amount of raw data that can be rapidly generated. Argonne National Laboratory developed the premier platform for the analysis of this new data (mg-rast) that is used by microbiologists worldwide. This paper uses the accounting from the computational analysis of more than 10,000,000,000 bp of DNA sequence data, describes an analysis of the advanced computational requirements, and suggests the level of analysis that will be essential as microbiologists move to understand how these tiny organisms affect our every day lives. The results from this analysis indicate that data analysis is a linear problem, but that most analyses are held up in queues. With sufficient resources, computations could be completed in a few hours for a typical dataset. These data also suggest execution times that delimit timely completion of computational analyses, and provide bounds for problematic processes.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC); National Institutes of Health (NIH)
DOE Contract Number:
DE-AC02-06CH11357
OSTI ID:
1009781
Report Number(s):
ANL/MCS/CP-62069; TRN: US201106%%999
Journal Information:
J. Phys.: Conf. Ser., Vol. 125, Issue 2008; Conference: Scientific Discovery through Advanced Computing 2008 (SciDAC 08); Jul. 13, 2008 - Jul. 17, 2008; Seattle, WA
Country of Publication:
United States
Language:
ENGLISH