DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Uniform-in-phase-space data selection with iterative normalizing flows

Journal Article · · Data-Centric Engineering
DOI: https://doi.org/10.1017/dce.2023.4 · OSTI ID:1983896
ORCiD logo [1];  [1];  [2];  [1]
  1. National Renewable Energy Laboratory (NREL), Golden, CO (United States)
  2. National Renewable Energy Laboratory (NREL), Golden, CO (United States); Princeton Univ., NJ (United States)

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that are routinely generated. In applications that are constrained by memory and computational intensity, excessively large datasets may hinder scientific discovery, making data reduction a critical component of data-driven methods. Datasets are growing in two directions: the number of data points and their dimensionality. Whereas dimension reduction typically aims at describing each data sample on lower-dimensional space, the focus here is on reducing the number of data points. A strategy is proposed to select data points such that they uniformly span the phase-space of the data. The algorithm proposed relies on estimating the probability map of the data and using it to construct an acceptance probability. An iterative method is used to accurately estimate the probability of the rare data points when only a small subset of the dataset is used to construct the probability map. Instead of binning the phase-space to estimate the probability map, its functional form is approximated with a normalizing flow. Therefore, the method naturally extends to high-dimensional datasets. The proposed framework is demonstrated as a viable pathway to enable data-efficient machine learning when abundant data are available.

Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Transportation Office. Vehicle Technologies Office
Grant/Contract Number:
AC36-08GO28308
OSTI ID:
1983896
Report Number(s):
NREL/JA-2C00-86434; MainId:87207; UUID:8f09d1e9-ed4a-484b-9f21-0c73dd0bd755; MainAdminID:69667
Journal Information:
Data-Centric Engineering, Journal Name: Data-Centric Engineering Vol. 4; ISSN 2632-6736
Publisher:
Cambridge University PressCopyright Statement
Country of Publication:
United States
Language:
English

References (49)

Effects of dissipation rate and diffusion rate of the progress variable on local fuel burning rate in premixed turbulent flames journal June 2017
Classification and computation of extreme events in turbulent combustion journal November 2021
Mining time-changing data streams
  • Hulten, Geoff; Spencer, Laurie; Domingos, Pedro
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining https://doi.org/10.1145/502512.502529
conference August 2001
Time-resolved study of transient soot formation in an aero-engine model combustor at elevated pressure journal January 2019
Deep learning-based model for progress variable dissipation rate in turbulent premixed flames journal January 2021
Reducing the Dimensionality of Data with Neural Networks journal July 2006
A study of the behavior of several methods for balancing machine learning training data journal June 2004
Data-based analysis of multimodal partial cavity shedding dynamics journal March 2020
Chemistry reduction using machine learning trained from non-premixed micro-mixing modeling: Application to DNS of a syngas turbulent oxy-flame with side-wall effects journal October 2020
Addressing imbalanced classification with instance generation techniques: IPADE-ID journal February 2014
Machine learning for integrating combustion chemistry in numerical simulations journal September 2021
Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy journal September 2009
Exploring phase space with Neural Importance Sampling journal January 2020
Probabilistic Data-Driven Sampling via Multi-Criteria Importance Analysis journal December 2021
ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement conference November 2014
The condensed nearest neighbor rule (Corresp.) journal May 1968
In situ data-driven adaptive sampling for large-scale simulation data summarization
  • Biswas, Ayan; Dutta, Soumya; Pulido, Jesus
  • Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization - ISAV '18 https://doi.org/10.1145/3281464.3281467
conference January 2018
Experimental data-based reduced-order model for analysis and prediction of flame transition in gas turbine combustors journal April 2019
An approximate inertial manifold (AIM) based closure for turbulent flows journal July 2022
Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources journal March 2020
Data-driven analysis of relight variability of jet fuels induced by turbulence journal March 2021
Turbulence and the dynamics of coherent structures. I. Coherent structures journal January 1987
Machine Learning for Fluid Mechanics journal September 2019
Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence report February 2019
Differential diffusion effects, distributed burning, and local extinctions in high Karlovitz premixed flames journal September 2015
Nonparametric density estimation for high‐dimensional data—Algorithms and applications journal April 2019
Data-driven Classification and Modeling of Combustion Regimes in Detonation Waves journal June 2020
Chemiluminescence imaging of an optically accessible non-premixed rotating detonation engine journal February 2017
Progress-variable approach for large-eddy simulation of non-premixed turbulent combustion journal January 1999
TuckerMPI journal June 2020
Asymptotic Properties of Nearest Neighbor Rules Using Edited Data journal July 1972
A priori estimation of memory effects in reduced-order models of nonlinear systems using the Mori–Zwanzig formalism
  • Gouasmi, Ayoub; Parish, Eric J.; Duraisamy, Karthik
  • Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 473, Issue 2205 https://doi.org/10.1098/rspa.2017.0385
journal September 2017
A Family of Nonparametric Density Estimation Algorithms journal September 2012
Least squares quantization in PCM journal March 1982
Evolutionary undersampling for extremely imbalanced big data classification under apache spark conference July 2016
Multivariate Pointwise Information-Driven Data Sampling and Visualization journal July 2019
Adversarial sampling of unknown and high-dimensional conditional distributions journal February 2022
Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study journal March 2012
Emerging trends in numerical simulations of combustion systems journal January 2019
Mixing and detonation structure in a rotating detonation engine with an axial air inlet journal January 2021
A survey on addressing high-class imbalance in big data journal November 2018
One-class classifiers with incremental learning and forgetting for data streams with concept drift journal October 2014
Turbulence Modeling in the Age of Data journal January 2019
Tensor-Train Decomposition journal January 2011
Detecting concept drift: An information entropy based method using an adaptive sliding window journal April 2014
Visualization-aware sampling for very large databases conference May 2016
Using physics-informed enhanced super-resolution generative adversarial networks for subfilter modeling in turbulent reactive flows journal January 2021
A survey on data preprocessing for data stream mining: Current status and future directions journal May 2017
Neural Importance Sampling journal November 2019