skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Breaking the High-Throughput Bottleneck: New tools help biologists integrate complex datasets

Journal Article · · Scientific Computing, 23(4):22-26
OSTI ID:878669

It’s not your high school biology lab. Today’s burgeoning field of systems biology takes researchers away from traditional one-gene-at-a-time bench experiments in favor of combining technologies from fields such as genomics, mass spectrometry, imaging, and informatics to advance their understanding of biological systems. The advent of high-throughput (HTP) technologies, such as transcriptomics (microarrays) and proteomics, has been fueling a revolution in biology by enabling systems-level analysis. These HTP approaches are especially promising for characterizing biomolecules at a global scale; however, the large, heterogeneous data sources make interpretation especially challenging. Microarrays—a method used to investigate the expression levels of thousands of genes simultaneously—produce data with very high dimensionality and a lot of variability from experiment to experiment. Proteomics, the study of protein expression patterns in organisms, is a vast, complex field that requires tools such as powerful separation methods and mass spectrometers and advanced algorithms to automate data processing. The lack of computational capabilities to analyze bulky datasets from HTP techniques is a bottleneck for this new era of biology research. As a consequence, many experiments that generate massive amounts of data are being conducted with limited forethought about the analysis methods; therefore, the full potential of the studies is never realized. Combining HTP measurements in a given experiment may provide more information but only exacerbate the problem. A global challenge for biology is integrating these data into computational models that predict cell response. One example is a change in protein expression. There are multiple levels at which protein production can be controlled, not to mention post-translational modifications that often dictate protein function. To understand how the abundance and activity of a given protein are regulated, a biologist has to be able to analyze changes in biomolecular expression over various time scales. But data analysis doesn’t end with analyzing the numbers. Effective visualizations help scientists explore and interpret data. Scientists in many domains have come to trust and rely on graphical representations of large datasets. Humans can perceive patterns that statistical and visualization methods reveal. Such patterns suggest relationships in the data that only a biology domain expert might be able to explain. Visualization technologies simplify the way we analyze massive amounts of complex data from multiple sources. In response to systems biology needs, the U.S. Department of Energy’s Pacific Northwest National Laboratory (PNNL) is finding solutions to problems associated with biology’s compute- and data-intensive nature. PNNL is developing bioinformatics/data management tools to archive, manage, and analyze biological data. Most of these tools will be publicly available at no charge. PNNL is also a global leader in the field of information analytics, housing the National Visualization and Analytics Center (NVAC), which provides national strategic leadership for visual analytics technology. The Laboratory’s combined expertise in computational and biological sciences creates an ideal environment to realize the promise of systems biology with state-of-the-art technology and software tools.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
878669
Report Number(s):
PNNL-SA-48217; TRN: US200611%%428
Journal Information:
Scientific Computing, 23(4):22-26, Vol. 23, Issue 4
Country of Publication:
United States
Language:
English