skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine learning etudes in astrophysics: selection functions for mock cluster catalogs

Abstract

Making mock simulated catalogs is an important component of astrophysical data analysis. Selection criteria for observed astronomical objects are often too complicated to be derived from first principles. However the existence of an observed group of objects is a well-suited problem for machine learning classification. In this paper we use one-class classifiers to learn the properties of an observed catalog of clusters of galaxies from ROSAT and to pick clusters from mock simulations that resemble the observed ROSAT catalog. We show how this method can be used to study the cross-correlations of thermal Sunya'ev-Zeldovich signals with number density maps of X-ray selected cluster catalogs. The method reduces the bias due to hand-tuning the selection function and is readily scalable to large catalogs with a high-dimensional space of astrophysical features.

Authors:
; ;  [1]
  1. Canadian Institute for Theoretical Astrophysics, University of Toronto, Toronto, ON M5S 3H8 (Canada)
Publication Date:
OSTI Identifier:
22382011
Resource Type:
Journal Article
Resource Relation:
Journal Name: Journal of Cosmology and Astroparticle Physics; Journal Volume: 2015; Journal Issue: 01; Other Information: Country of input: International Atomic Energy Agency (IAEA)
Country of Publication:
United States
Language:
English
Subject:
79 ASTROPHYSICS, COSMOLOGY AND ASTRONOMY; ASTROPHYSICS; CATALOGS; CLASSIFICATION; CORRELATIONS; DATA ANALYSIS; DENSITY; EV RANGE; FUNCTIONS; GALAXY CLUSTERS; LEARNING; MAPS; SIGNALS; SIMULATION; SPACE; TUNING; X RADIATION

Citation Formats

Hajian, Amir, Alvarez, Marcelo A., and Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca. Machine learning etudes in astrophysics: selection functions for mock cluster catalogs. United States: N. p., 2015. Web. doi:10.1088/1475-7516/2015/01/038.
Hajian, Amir, Alvarez, Marcelo A., & Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca. Machine learning etudes in astrophysics: selection functions for mock cluster catalogs. United States. doi:10.1088/1475-7516/2015/01/038.
Hajian, Amir, Alvarez, Marcelo A., and Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca. Thu . "Machine learning etudes in astrophysics: selection functions for mock cluster catalogs". United States. doi:10.1088/1475-7516/2015/01/038.
@article{osti_22382011,
title = {Machine learning etudes in astrophysics: selection functions for mock cluster catalogs},
author = {Hajian, Amir and Alvarez, Marcelo A. and Bond, J. Richard, E-mail: ahajian@cita.utoronto.ca, E-mail: malvarez@cita.utoronto.ca, E-mail: bond@cita.utoronto.ca},
abstractNote = {Making mock simulated catalogs is an important component of astrophysical data analysis. Selection criteria for observed astronomical objects are often too complicated to be derived from first principles. However the existence of an observed group of objects is a well-suited problem for machine learning classification. In this paper we use one-class classifiers to learn the properties of an observed catalog of clusters of galaxies from ROSAT and to pick clusters from mock simulations that resemble the observed ROSAT catalog. We show how this method can be used to study the cross-correlations of thermal Sunya'ev-Zeldovich signals with number density maps of X-ray selected cluster catalogs. The method reduces the bias due to hand-tuning the selection function and is readily scalable to large catalogs with a high-dimensional space of astrophysical features.},
doi = {10.1088/1475-7516/2015/01/038},
journal = {Journal of Cosmology and Astroparticle Physics},
number = 01,
volume = 2015,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}
  • We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less
  • We present a new quasi-stellar object (QSO) selection algorithm using a Support Vector Machine, a supervised classification method, on a set of extracted time series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars, and microlensing events using 58 known QSOs, 1629 variable stars, and 4288 non-variables in the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies {approx}80% of knownmore » QSOs with a 25% false-positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) data set, which consists of 40 million light curves, and found 1620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false-positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.« less
  • Our research objective in this paper is to reconstruct an initial linear density field, which follows the multivariate Gaussian distribution with variances given by the linear power spectrum of the current cold dark matter model and evolves through gravitational instabilities to the present-day density field in the local universe. For this purpose, we develop a Hamiltonian Markov Chain Monte Carlo method to obtain the linear density field from a posterior probability function that consists of two components: a prior of a Gaussian density field with a given linear spectrum and a likelihood term that is given by the current densitymore » field. The present-day density field can be reconstructed from galaxy groups using the method developed in Wang et al. Using a realistic mock Sloan Digital Sky Survey DR7, obtained by populating dark matter halos in the Millennium simulation (MS) with galaxies, we show that our method can effectively and accurately recover both the amplitudes and phases of the initial, linear density field. To examine the accuracy of our method, we use N-body simulations to evolve these reconstructed initial conditions to the present day. The resimulated density field thus obtained accurately matches the original density field of the MS in the density range 0.3{approx}<{rho}/ {rho}-bar {approx}<20 without any significant bias. In particular, the Fourier phases of the resimulated density fields are tightly correlated with those of the original simulation down to a scale corresponding to a wavenumber of {approx}1 h Mpc{sup -1}, much smaller than the translinear scale, which corresponds to a wavenumber of {approx}0.15 h Mpc{sup -1}.« less
  • We develop empirical methods for modeling the galaxy population and populating cosmological N-body simulations with mock galaxies according to the observed properties of galaxies in survey data. We use these techniques to produce a new set of mock catalogs for the DEEP2 Galaxy Redshift Survey based on the output of the high-resolution Bolshoi simulation, as well as two other simulations with different cosmological parameters, all of which we release for public use. The mock-catalog creation technique uses subhalo abundance matching to assign galaxy luminosities to simulated dark-matter halos. It then adds color information to the resulting mock galaxies in amore » manner that depends on the local galaxy density, in order to reproduce the measured color-environment relation in the data. In the course of constructing the catalogs, we test various models for including scatter in the relation between halo mass and galaxy luminosity, within the abundance-matching framework. We find that there is no constant-scatter model that can simultaneously reproduce both the luminosity function and the autocorrelation function of DEEP2. This result has implications for galaxy-formation theory, and it restricts the range of contexts in which the mock catalogs can be usefully applied. Nevertheless, careful comparisons show that our new mock catalogs accurately reproduce a wide range of the other properties of the DEEP2 catalog, suggesting that they can be used to gain a detailed understanding of various selection effects in DEEP2.« less
  • We introduce the Theoretical Astrophysical Observatory (TAO), an online virtual laboratory that houses mock observations of galaxy survey data. Such mocks have become an integral part of the modern analysis pipeline. However, building them requires expert knowledge of galaxy modeling and simulation techniques, significant investment in software development, and access to high performance computing. These requirements make it difficult for a small research team or individual to quickly build a mock catalog suited to their needs. To address this TAO offers access to multiple cosmological simulations and semi-analytic galaxy formation models from an intuitive and clean web interface. Results canmore » be funnelled through science modules and sent to a dedicated supercomputer for further processing and manipulation. These modules include the ability to (1) construct custom observer light cones from the simulation data cubes; (2) generate the stellar emission from star formation histories, apply dust extinction, and compute absolute and/or apparent magnitudes; and (3) produce mock images of the sky. All of TAO’s features can be accessed without any programming requirements. The modular nature of TAO opens it up for further expansion in the future.« less