skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES

Abstract

We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, amore » useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less

Authors:
; ; ; ;  [1];  [2]
  1. McWilliams Center for Cosmology, Department of Physics, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States)
  2. School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213 (United States)
Publication Date:
OSTI Identifier:
22121770
Resource Type:
Journal Article
Journal Name:
Astrophysical Journal
Additional Journal Information:
Journal Volume: 772; Journal Issue: 2; Other Information: Country of input: International Atomic Energy Agency (IAEA); Journal ID: ISSN 0004-637X
Country of Publication:
United States
Language:
English
Subject:
79 ASTROPHYSICS, COSMOLOGY AND ASTRONOMY; ALGORITHMS; ARTIFICIAL INTELLIGENCE; CATALOGS; CORRELATION FUNCTIONS; ERRORS; GALAXIES; LEARNING; PARTICLES

Citation Formats

Xiaoying, Xu, Ho, Shirley, Trac, Hy, Schneider, Jeff, Ntampaka, Michelle, and Poczos, Barnabas. A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES. United States: N. p., 2013. Web. doi:10.1088/0004-637X/772/2/147.
Xiaoying, Xu, Ho, Shirley, Trac, Hy, Schneider, Jeff, Ntampaka, Michelle, & Poczos, Barnabas. A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES. United States. doi:10.1088/0004-637X/772/2/147.
Xiaoying, Xu, Ho, Shirley, Trac, Hy, Schneider, Jeff, Ntampaka, Michelle, and Poczos, Barnabas. Thu . "A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES". United States. doi:10.1088/0004-637X/772/2/147.
@article{osti_22121770,
title = {A FIRST LOOK AT CREATING MOCK CATALOGS WITH MACHINE LEARNING TECHNIQUES},
author = {Xiaoying, Xu and Ho, Shirley and Trac, Hy and Schneider, Jeff and Ntampaka, Michelle and Poczos, Barnabas},
abstractNote = {We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: support vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.},
doi = {10.1088/0004-637X/772/2/147},
journal = {Astrophysical Journal},
issn = {0004-637X},
number = 2,
volume = 772,
place = {United States},
year = {2013},
month = {8}
}