The smallest cells pose the biggest problem : high performance computing and the analysis of metagenome sequence data.
- Mathematics and Computer Science
New high-throughput DNA sequencing technologies have revolutionized how scientists study the organisms around us. In particular, microbiology - the study of the smallest, unseen organisms that pervade our lives - has embraced these new techniques to characterize and analyze the cellular constituents and use this information to develop novel tools, techniques, and therapeutics. So-called next-generation DNA sequencing platforms have resulted in huge increases in the amount of raw data that can be rapidly generated. Argonne National Laboratory developed the premier platform for the analysis of this new data (mg-rast) that is used by microbiologists worldwide. This paper uses the accounting from the computational analysis of more than 10,000,000,000 bp of DNA sequence data, describes an analysis of the advanced computational requirements, and suggests the level of analysis that will be essential as microbiologists move to understand how these tiny organisms affect our every day lives. The results from this analysis indicate that data analysis is a linear problem, but that most analyses are held up in queues. With sufficient resources, computations could be completed in a few hours for a typical dataset. These data also suggest execution times that delimit timely completion of computational analyses, and provide bounds for problematic processes.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC); National Institutes of Health (NIH)
- DOE Contract Number:
- DE-AC02-06CH11357
- OSTI ID:
- 1009781
- Report Number(s):
- ANL/MCS/CP-62069; TRN: US201106%%999
- Journal Information:
- J. Phys.: Conf. Ser., Vol. 125, Issue 2008; Conference: Scientific Discovery through Advanced Computing 2008 (SciDAC 08); Jul. 13, 2008 - Jul. 17, 2008; Seattle, WA
- Country of Publication:
- United States
- Language:
- ENGLISH
Similar Records
Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG
A trait-based framework for linking microbial communities with carbon transformations under precipitation change