skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multivariate Pointwise Information-Driven Data Sampling and Visualization

Abstract

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific datamore » sets.« less

Authors:
ORCiD logo; ;
Publication Date:
Research Org.:
Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1543016
Alternate Identifier(s):
OSTI ID: 1544707
Report Number(s):
LA-UR-19-24243
Journal ID: ISSN 1099-4300; ENTRFG; PII: e21070699
Grant/Contract Number:  
ECP Alpine project; 89233218CNA000001
Resource Type:
Published Article
Journal Name:
Entropy
Additional Journal Information:
Journal Name: Entropy Journal Volume: 21 Journal Issue: 7; Journal ID: ISSN 1099-4300
Publisher:
MDPI AG
Country of Publication:
Switzerland
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Multivariate sampling; information theory; pointwise mutual information (PMI); total correlation; specific correlation; statistical distributions; data reduction; query-driven visualization

Citation Formats

Dutta, Soumya, Biswas, Ayan, and Ahrens, James. Multivariate Pointwise Information-Driven Data Sampling and Visualization. Switzerland: N. p., 2019. Web. doi:10.3390/e21070699.
Dutta, Soumya, Biswas, Ayan, & Ahrens, James. Multivariate Pointwise Information-Driven Data Sampling and Visualization. Switzerland. doi:10.3390/e21070699.
Dutta, Soumya, Biswas, Ayan, and Ahrens, James. Tue . "Multivariate Pointwise Information-Driven Data Sampling and Visualization". Switzerland. doi:10.3390/e21070699.
@article{osti_1543016,
title = {Multivariate Pointwise Information-Driven Data Sampling and Visualization},
author = {Dutta, Soumya and Biswas, Ayan and Ahrens, James},
abstractNote = {With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.},
doi = {10.3390/e21070699},
journal = {Entropy},
number = 7,
volume = 21,
place = {Switzerland},
year = {2019},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.3390/e21070699

Save / Share:

Works referenced in this record:

In-situ Sampling of a Large-Scale Particle Simulation for Interactive Visualization and Analysis
journal, June 2011


A unified information-theoretic framework for viewpoint selection and mesh saliency
journal, February 2009

  • Feixas, Miquel; Sbert, Mateu; González, Francisco
  • ACM Transactions on Applied Perception, Vol. 6, Issue 1
  • DOI: 10.1145/1462055.1462056

Mutual-information-based registration of medical images: a survey
journal, August 2003

  • Pluim, J. P. W.; Maintz, J. B. A.; Viergever, M. A.
  • IEEE Transactions on Medical Imaging, Vol. 22, Issue 8
  • DOI: 10.1109/TMI.2003.815867

Three-dimensional simulations of oblique asteroid impacts into water
journal, June 2018

  • Gisler, Galen R.; Heberling, Tamra; Plesko, Catherine S.
  • Journal of Space Safety Engineering, Vol. 5, Issue 2
  • DOI: 10.1016/j.jsse.2018.06.001

Visualizing Multivariate Volume Data from Turbulent Combustion Simulations
journal, March 2007

  • Akiba, Hiroshi; Ma, Kwan-liu; Chen, Jacqueline
  • Computing in Science and Engineering, Vol. 9, Issue 2
  • DOI: 10.1109/MCSE.2007.42

Variable Interactions in Query-Driven Visualization
journal, November 2007

  • Gosink, Luke; Anderson, John; Bethel, Wes
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 13, Issue 6
  • DOI: 10.1109/TVCG.2007.70519

Query-Driven Visualization of Large Data Sets
conference, January 2005


In Situ Eddy Analysis in a High-Resolution Ocean Climate Model
journal, January 2016

  • Woodring, Jonathan; Petersen, Mark; Schmeiber, Andre
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 22, Issue 1
  • DOI: 10.1109/TVCG.2015.2467411

Multimodal Data Fusion Based on Mutual Information
journal, September 2012

  • Bramon, R.; Boada, I.; Bardera, A.
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 18, Issue 9
  • DOI: 10.1109/TVCG.2011.280

Pointwise information guided visual analysis of time-varying multi-fields
conference, January 2017

  • Dutta, Soumya; Liu, Xiaotong; Biswas, Ayan
  • SIGGRAPH Asia 2017 Symposium on Visualization on - SA '17
  • DOI: 10.1145/3139295.3139298

Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data
conference, September 2017

  • Li, Shaomeng; Sane, Sudhanshu; Orf, Leigh
  • 2017 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2017.15

Fifty years of Shannon theory
journal, January 1998

  • Verdu, S.
  • IEEE Transactions on Information Theory, Vol. 44, Issue 6
  • DOI: 10.1109/18.720531

Distance between Sets
journal, November 1971

  • Levandowsky, Michael; Winter, David
  • Nature, Vol. 234, Issue 5323
  • DOI: 10.1038/234034a0

ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement
conference, November 2014

  • Nouanesengsy, Boonthanome; Woodring, Jonathan; Patchett, John
  • 2014 IEEE 4th Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2014.7013203

Information Theory in Scientific Visualization
journal, January 2011


An Exact Algorithm for Maximum Entropy Sampling
journal, August 1995

  • Ko, Chun-Wa; Lee, Jon; Queyranne, Maurice
  • Operations Research, Vol. 43, Issue 4
  • DOI: 10.1287/opre.43.4.684

Image and Distribution Based Volume Rendering for Large Data Sets
conference, April 2018


Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation
journal, August 2007


Information Guided Data Sampling and Recovery Using Bitmap Indexing
conference, April 2018


Image Quality Assessment: From Error Visibility to Structural Similarity
journal, April 2004

  • Wang, Z.; Bovik, A. C.; Sheikh, H. R.
  • IEEE Transactions on Image Processing, Vol. 13, Issue 4
  • DOI: 10.1109/TIP.2003.819861

Multimodality image registration by maximization of mutual information
journal, April 1997

  • Maes, F.; Collignon, A.; Vandermeulen, D.
  • IEEE Transactions on Medical Imaging, Vol. 16, Issue 2
  • DOI: 10.1109/42.563664

Multi-modal volume registration by maximization of mutual information
journal, March 1996


An Information-Aware Framework for Exploring Multivariate Data Sets
journal, December 2013

  • Biswas, Ayan; Dutta, Soumya
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 19, Issue 12
  • DOI: 10.1109/TVCG.2013.133

Maximum entropy sampling
journal, January 1987


Information Theoretical Analysis of Multivariate Correlation
journal, January 1960

  • Watanabe, Satosi
  • IBM Journal of Research and Development, Vol. 4, Issue 1
  • DOI: 10.1147/rd.41.0066

A compact multivariate histogram representation for query-driven visualization
conference, October 2015

  • Lu, Kewei; Shen, Han-Wei
  • 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2015.7348071

Centrality clustering-based sampling for big data visualization
conference, July 2016


Evaluating the efficacy of wavelet configurations on turbulent-flow data
conference, October 2015

  • Li, Shaomeng; Gruchalla, Kenny; Potter, Kristin
  • 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2015.7348075

A mathematical theory of communication
journal, January 2001

  • Shannon, C. E.
  • ACM SIGMOBILE Mobile Computing and Communications Review, Vol. 5, Issue 1
  • DOI: 10.1145/584091.584093

Coherent view-dependent streamline selection for importance-driven flow visualization
conference, February 2013

  • Ma, Jun; Wang, Chaoli; Shene, Ching-Kuang
  • IS&T/SPIE Electronic Imaging, SPIE Proceedings
  • DOI: 10.1117/12.2001887

Association Analysis for Visual Exploration of Multivariate Scientific Data Sets
journal, January 2016

  • Liu, Xiaotong; Shen, Han-Wei
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 22, Issue 1
  • DOI: 10.1109/TVCG.2015.2467431

A Unified Approach to Streamline Selection and Viewpoint Selection for 3D Flow Visualization
journal, March 2013

  • Tao, Jun; Ma, Jun; Wang, Chaoli
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 19, Issue 3
  • DOI: 10.1109/TVCG.2012.143

An Information-theoretic Framework for Visualization
journal, November 2010

  • Min Chen, ; Jäenicke, Heike
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 16, Issue 6
  • DOI: 10.1109/TVCG.2010.132

Isosurface Similarity Maps
journal, June 2010


Weighted finite population sampling to maximize entropy
journal, January 1994


Multifield
journal, November 2007

  • Janicke, H.; Wiebel, A.; Scheuermann, G.
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 13, Issue 6
  • DOI: 10.1109/TVCG.2007.70615

Importance-Driven Focus of Attention
journal, September 2006

  • Viola, Ivan; Feixas, Miquel; Sbert, Mateu
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 12, Issue 5
  • DOI: 10.1109/TVCG.2006.152

Model Bank State Estimation for Power Grids Using Importance Sampling
journal, November 2013


In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations
journal, January 2017

  • Dutta, Soumya; Chen, Chun-Ming; Heinlein, Gregory
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 23, Issue 1
  • DOI: 10.1109/TVCG.2016.2598604

Shape complexity based on mutual information
conference, January 2005

  • Rigau, J.; Feixas, M.; Sbert, M.
  • International Conference on Shape Modeling and Applications 2005 (SMI' 05)
  • DOI: 10.1109/SMI.2005.42

Informational Aesthetics Measures
journal, March 2008

  • Rigau, Jaume; Feixas, Miquel; Sbert, Mateu
  • IEEE Computer Graphics and Applications, Vol. 28, Issue 2
  • DOI: 10.1109/MCG.2008.34

Statistical visualization and analysis of large data using a value-based spatial distribution
conference, April 2017


Visualization of Multi-Variate Scientific Data
journal, September 2009


Measuring and testing dependence by correlation of distances
journal, December 2007

  • Székely, Gábor J.; Rizzo, Maria L.; Bakirov, Nail K.
  • The Annals of Statistics, Vol. 35, Issue 6
  • DOI: 10.1214/009053607000000505

In situ data-driven adaptive sampling for large-scale simulation data summarization
conference, January 2018

  • Biswas, Ayan; Dutta, Soumya; Pulido, Jesus
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18
  • DOI: 10.1145/3281464.3281467

Multifield-Graphs: An Approach to Visualizing Correlations in Multifield Scalar Data
journal, September 2006

  • Sauber, N.; Theisel, H.; Seidel, H. -P.
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 12, Issue 5
  • DOI: 10.1109/TVCG.2006.165

Word association norms, mutual information, and lexicography
conference, January 1989

  • Church, Kenneth Ward; Hanks, Patrick
  • Proceedings of the 27th annual meeting on Association for Computational Linguistics -
  • DOI: 10.3115/981623.981633

CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data
journal, January 2019

  • Hazarika, Subhashis; Dutta, Soumya; Shen, Han-Wei
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 25, Issue 1
  • DOI: 10.1109/TVCG.2018.2864801

Evaluating Isosurfaces with Level-set-based Information Maps
journal, June 2013

  • Wei, Tzu-Hsuan; Lee, Teng-Yok; Shen, Han-Wei
  • Computer Graphics Forum, Vol. 32, Issue 3pt1
  • DOI: 10.1111/cgf.12087

Medical image registration
journal, February 2001

  • Hill, Derek L. G.; Batchelor, Philipp G.; Holden, Mark
  • Physics in Medicine and Biology, Vol. 46, Issue 3
  • DOI: 10.1088/0031-9155/46/3/201

Information Theory-Based Automatic Multimodal Transfer Function Design
journal, July 2013

  • Bramon, R.; Ruiz, M.; Bardera, A.
  • IEEE Journal of Biomedical and Health Informatics, Vol. 17, Issue 4
  • DOI: 10.1109/JBHI.2013.2263227

An Image-Based Approach to Extreme Scale in Situ Visualization and Analysis
conference, November 2014

  • Ahrens, James; Jourdain, Sebastien; OLeary, Patrick
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.40

Explorable images for visualizing volume data
conference, March 2010


Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets
conference, April 2017


An Information-Theoretic Observation Channel for Volume Visualization
journal, June 2013

  • Bramon, R.; Ruiz, M.; Bardera, A.
  • Computer Graphics Forum, Vol. 32, Issue 3pt4
  • DOI: 10.1111/cgf.12128

An Application of Multivariate Statistical Analysis for Query-Driven Visualization
journal, March 2011

  • Gosink, L. J.; Garth, C.; Anderson, J. C.
  • IEEE Transactions on Visualization and Computer Graphics, Vol. 17, Issue 3
  • DOI: 10.1109/TVCG.2010.80