skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine Learning Algorithms for Matching Theories, Simulations, and Observations in Cosmology (Final Project)

Technical Report ·
DOI:https://doi.org/10.2172/1572709· OSTI ID:1572709
 [1]
  1. Carnegie Mellon Univ., Pittsburgh, PA (United States)

The search for simple and elegant equations to explain and predict observable natural phenomena has been a driving force in scientific theory for several centuries. The great successes of Newton's laws, Maxwell's equations, the Schrodinger equation, and the theory of General Relativity have encouraged this perspective, and perhaps a beautiful equation comprising a “theory of everything” remains to be discovered. But now scientists are grappling with systems of ever-increasing complexity, fraught with nonlinearity, and with the quantities of interest often only indirectly observed. Nature may resist a simple description, and the most important discoveries of the next century may be complex theories with countless variables and parameters. The era of big data opens up a promising new approach to scientific discovery in this setting. The predictions of modern theories, even if complex or nonlinear, can be examined through detailed and computationally-intensive simulations that take days or weeks to run and fill petabytes of disk space. The upcoming generation of scientific instruments will provide petabytes of observations at unprecedented resolution and depth. The challenge is to compare data to simulations so as to test theories and identify those that best match the observations. The data are numerous and rich but are subject to noise and various systematic biases. The simulations are large and detailed but are so costly that the effective sample size is small. Moreover, the mapping from data to simulation to the parameters of interest is typically complicated and ill-conditioned. Although these challenges, and the solutions we propose, apply across the sciences, we will ground our efforts by focusing on cosmology. Here, we are truly entering the era of big data in several senses. Ever larger experiments survey larger and larger cosmological volumes, resulting in enormous amounts of pixel data to process and extract the information of cosmological interest. These include large imaging surveys, which collect all the light in broad wavelength ranges over significant fractions of the sky, and spectroscopic surveys, which measure the light as a function of wavelength to map out 3D distances. These experiments are designed to shed light on the big unanswered questions in cosmology. Distilling the petabytes of data into observed quantities for comparison with theory is a daunting task. But these large experiments do not only result in large amounts of data to analyze: if we wish to compare to theories of physics, then we need increasingly extensive predictions for the observed quantities as well, which requires ever-larger cosmological simulations that take substantial -- and at times prohibitive -- computational resources to produce. Our goal is to develop statistical and machine learning methods for using observed and simulated data to advance machine learning with applications cosmology. In particular, we focus our research efforts on the following tasks: The challenge for automated science is that it is computationally impossible and statistically dangerous to consider every possible model in order to find the best one. We will develop Bayesian Optimization based active-learning methods that accelerate both the execution of the simulations and the search for best-fitting parameters. The key idea is to make the simulations adaptive -- across resolutions, time, and parameters -- using the data to search as the simulation runs; Most ML methods operate on simple finite dimensional feature vectors. However, many cosmology and other science applications require ML methods that can operate on more complex objects as inputs or outputs such as functions, distributions, or sets and point clouds. Our goal is to develop efficient ML methods for this problem and demonstrate their applicability in Cosmology, Astrophysics, and other science problems.

Research Organization:
Carnegie Mellon Univ., Pittsburgh, PA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
DOE Contract Number:
SC0011114
OSTI ID:
1572709
Report Number(s):
DOE-CMU-1114
Country of Publication:
United States
Language:
English

Similar Records

Calibration and Commissioning of the LSST
Technical Report · Mon Aug 01 00:00:00 EDT 2022 · OSTI ID:1572709

Galaxy morphological classification in deep-wide surveys via unsupervised machine learning
Journal Article · Sat Oct 26 00:00:00 EDT 2019 · Monthly Notices of the Royal Astronomical Society · OSTI ID:1572709

Machine Learning in the Big Data Era: Are We There Yet?
Conference · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:1572709