skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets

Abstract

Much of the previous work in the large data visualization area has solely focused on handling the scale of the data. This task is clearly a great challenge and necessary, but it is not sufficient. Applying standard visualization techniques to large scale data sets often creates complicated pictures where meaningful trends are lost. A second challenge, then, is to also provide algorithms that simplify what an analyst must understand, using either visual or quantitative means. This challenge can be summarized as improving the legibility or reducing the complexity of massive data sets. Fully meeting both of these challenges is the work of many, many PhD dissertations. In this dissertation, we describe some new techniques to address both the scale and legibility challenges, in hope of contributing to the larger solution. In addition to our assumption of simultaneously addressing both scale and legibility, we add an additional requirement that the solutions considered fit well within an interoperable framework for diverse algorithms, because a large suite of algorithms is often necessary to fully understand complex data sets. For scale, we present a general architecture for handling large data, as well as details of a contract-based system for integrating advanced optimizations into amore » data flow network design. We also describe techniques for volume rendering and performing comparisons at the extreme scale. For legibility, we present several techniques. Most noteworthy are equivalence class functions, a technique to drive visualizations using statistical methods, and line-scan based techniques for characterizing shape.« less

Authors:
 [1]
  1. Univ. of California, Davis, CA (United States)
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
900438
Report Number(s):
UCRL-TH-226455
TRN: US200711%%131
DOE Contract Number:
W-7405-ENG-48
Resource Type:
Thesis/Dissertation
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; ALGORITHMS; ARCHITECTURE; DESIGN; SHAPE

Citation Formats

Childs, Hank R. An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets. United States: N. p., 2006. Web. doi:10.2172/900438.
Childs, Hank R. An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets. United States. doi:10.2172/900438.
Childs, Hank R. Sun . "An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets". United States. doi:10.2172/900438. https://www.osti.gov/servlets/purl/900438.
@article{osti_900438,
title = {An Analysis Framework Addressing the Scale and Legibility of Large Scientific Data Sets},
author = {Childs, Hank R.},
abstractNote = {Much of the previous work in the large data visualization area has solely focused on handling the scale of the data. This task is clearly a great challenge and necessary, but it is not sufficient. Applying standard visualization techniques to large scale data sets often creates complicated pictures where meaningful trends are lost. A second challenge, then, is to also provide algorithms that simplify what an analyst must understand, using either visual or quantitative means. This challenge can be summarized as improving the legibility or reducing the complexity of massive data sets. Fully meeting both of these challenges is the work of many, many PhD dissertations. In this dissertation, we describe some new techniques to address both the scale and legibility challenges, in hope of contributing to the larger solution. In addition to our assumption of simultaneously addressing both scale and legibility, we add an additional requirement that the solutions considered fit well within an interoperable framework for diverse algorithms, because a large suite of algorithms is often necessary to fully understand complex data sets. For scale, we present a general architecture for handling large data, as well as details of a contract-based system for integrating advanced optimizations into a data flow network design. We also describe techniques for volume rendering and performing comparisons at the extreme scale. For legibility, we present several techniques. Most noteworthy are equivalence class functions, a technique to drive visualizations using statistical methods, and line-scan based techniques for characterizing shape.},
doi = {10.2172/900438},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Jan 01 00:00:00 EST 2006},
month = {Sun Jan 01 00:00:00 EST 2006}
}

Thesis/Dissertation:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this thesis or dissertation.

Save / Share:
  • This thesis centers on the use of spectral modeling techniques on data from the Sloan Digital Sky Survey (SDSS) to gain new insights into current questions in galaxy evolution. The SDSS provides a large, uniform, high quality data set which can be exploited in a number of ways. One avenue pursued here is to use the large sample size to measure precisely the mean properties of galaxies of increasingly narrow parameter ranges. The other route taken is to look for rare objects which open up for exploration new areas in galaxy parameter space. The crux of this thesis is revisitingmore » the classical Kennicutt method for inferring the stellar initial mass function (IMF) from the integrated light properties of galaxies. A large data set (~ 10 5 galaxies) from the SDSS DR4 is combined with more in-depth modeling and quantitative statistical analysis to search for systematic IMF variations as a function of galaxy luminosity. Galaxy Hα equivalent widths are compared to a broadband color index to constrain the IMF. It is found that for the sample as a whole the best fitting IMF power law slope above 0.5 M is Γ = 1.5 ± 0.1 with the error dominated by systematics. Galaxies brighter than around M r,0.1 = -20 (including galaxies like the Milky Way which has M r,0.1 ~ -21) are well fit by a universal Γ ~ 1.4 IMF, similar to the classical Salpeter slope, and smooth, exponential star formation histories (SFH). Fainter galaxies prefer steeper IMFs and the quality of the fits reveal that for these galaxies a universal IMF with smooth SFHs is actually a poor assumption. Related projects are also pursued. A targeted photometric search is conducted for strongly lensed Lyman break galaxies (LBG) similar to MS1512-cB58. The evolution of the photometric selection technique is described as are the results of spectroscopic follow-up of the best targets. The serendipitous discovery of two interesting blue compact dwarf galaxies is reported. These galaxies were identified by their extremely weak (< 150) [N π] Γ6584 to Hα emission line ratios. Abundance analysis from emission line fluxes reveals that these galaxies have gas phase oxygen abundances 12 + log(O/H) ~ 7.7 to 7.9, not remarkably low, and near infrared imaging detects an old stellar population. However, the measured nitrogen to oxygen ratios log(N/O) < 1.7 are anomalously low for blue compact dwarf galaxies. These objects may be useful for understanding the chemical evolution of nitrogen.« less
  • Designers charged with creating tools for processes foreign to their own experience need a reliable source of application knowledge. This dissertation presents an empirical study of the scientific data analysis process in order to inform the design of tools for this important aspect of scientific computing. Interaction analysis and contextual inquiry methods were adapted to observe scientists analyzing their own data and to characterize the scientific dam analysis process. The characterization exposed elements of the process outside the conventional scientific visualization model that defines data analysis in terms of image generation. Scientists queried for quantitative information, made a variety ofmore » comparisons, applied mathematics, managed data, and kept records. Many such elements were only indirectly supported by computer. A detailed description of the scientific data analysis process was developed to provide a broad-based foundation of understanding which is rooted in empirical fact, reasonably comprehensive, and applicable to a range of scientific environments. The characterization of scientific data analysis led to design recommendations for improving the support of this process. The application of the results was demonstrated with the design, development, and study of a prototype tool for an inadequately supported scientific dam analysis element. Data culling is the identification and extraction of areas of interest in large scientific data sets. Modern workstation-based analysis tools require manageable subsets of data, but dam culling is not well supported. A prototype tool was designed and developed to explore a quantitative rather than image-based approach to identifying such subsets. Physicist end-users participated throughout the design, development, and evaluation process. The results of evaluations in the field established conditions under which a number-based approach to data selection effectively supplements an image-based approach.« less
  • I present time-varying Reeb graphs as a topological framework to support the analysis of continuous time-varying data. Such data is captured in many studies, including computational fluid dynamics, oceanography, medical imaging, and climate modeling, by measuring physical processes over time, or by modeling and simulating them on a computer. Analysis tools are applied to these data sets by scientists and engineers who seek to understand the underlying physical processes. A popular tool for analyzing scientific datasets is level sets, which are the points in space with a fixed data value s. Displaying level sets allows the user to study theirmore » geometry, their topological features such as connected components, handles, and voids, and to study the evolution of these features for varying s. For static data, the Reeb graph encodes the evolution of topological features and compactly represents topological information of all level sets. The Reeb graph essentially contracts each level set component to a point. It can be computed efficiently, and it has several uses: as a succinct summary of the data, as an interface to select meaningful level sets, as a data structure to accelerate level set extraction, and as a guide to remove noise. I extend these uses of Reeb graphs to time-varying data. I characterize the changes to Reeb graphs over time, and develop an algorithm that can maintain a Reeb graph data structure by tracking these changes over time. I store this sequence of Reeb graphs compactly, and call it a time-varying Reeb graph. I augment the time-varying Reeb graph with information that records the topology of level sets of all level values at all times, that maintains the correspondence of level set components over time, and that accelerates the extraction of level sets for a chosen level value and time. Scientific data sampled in space-time must be extended everywhere in this domain using an interpolant. A poor choice of interpolant can create degeneracies that are difficult to resolve, making construction of time-varying Reeb graphs impractical. I investigate piecewise-linear, piecewise-trilinear, and piecewise-prismatic interpolants, and conclude that piecewise-prismatic is the best choice for computing time-varying Reeb graphs. Large Reeb graphs must be simplified for an effective presentation in a visualization system. I extend an algorithm for simplifying static Reeb graphs to compute simplifications of time-varying Reeb graphs as a first step towards building a visualization system to support the analysis of time-varying data.« less
  • An attempt to apply modern large-scale system stability theory to a complex nuclear power plant is presented. The effectiveness of both the vector and the scalar Lyapunov methods for stability analysis were examined in this application. A dynamic model was derived comprising the components of the nuclear power plant, namely, the reactor, the pressurizer, the steam generator, the turbines, the feedwater heater and the electric generator. This particular decomposition into subsystems involves a compromise between a reasonable physical description and mathematical tractability. A Lyapunov function was developed for each isolated subsystem and an exponential stability condition was obtained by settingmore » a bound on the overall Lyapunov function and in terms of interconnection functions between the elements or subsystems of the plant. It was found that in applying both the vector and scalar Lyapunov methods in the case of the particular physical results decomposition of the plant made here, that unequivocal results on the stability of the nuclear power plant could not be obtained. It is believed that this is due to the inability of the theory to treat strong coupled subsystems such as occur in the nuclear power plant. Both the vector and scalar Lyapunov methods could, on the other hand, be applied successfully in the case of the finer decomposition in terms of state variables.« less
  • Problems of computation and development of VLSI structures are considered in relation to each other. In particular, two issues are addressed: (a) development of components and algorithms for standard operations, suitable for VLSI implementation; (b) large-scale computation, in this case the iterative solution of large least-square problems in a limited-size VLSI architecture. On standard operations, improved and new adders are presented that can be implemented in VLSI. The adders so designed are shown to be superior when compared to other existing ones. Moreover, an iterative multiplier that uses carry save adders is also presented. On large-scale computation, analysis of iterativemore » techniques for least-squares problems is first addressed. New convergence results are obtained and explicit expressions, for the optimal parameters as well as for their corresponding optimal asymptotic rate of convergence are derived for the family of iterative schemes known as Accelerated Overrelaxation (AOR). Moreover, partitioning of the iterative algorithm and time-space expansion are used so that a parallel implementation of the iterative scheme is obtained, in a way that computation can be performed, in a fixed-size VLSI architecture, independent of the size of the problem.« less