DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Constraining Galaxy-Halo connection using machine learning

Journal Article · · Astronomy and Computing
ORCiD logo [1];  [2]
  1. Kansas State Univ., Manhattan, KS (United States)
  2. Kansas State Univ., Manhattan, KS (United States); Georgian National Astrophysical Observatory (Georgia); Ilia State Univ. (Georgia)

We investigate the potential of machine learning (ML) methods to model small-scale galaxy clustering for constraining Halo Occupation Distribution (HOD) parameters. Our analysis reveals that while many ML algorithms report good statistical fits, they often yield likelihood contours that are significantly biased in both mean values and variances relative to the true model parameters. This highlights the importance of careful data processing and algorithm selection in ML applications for galaxy clustering, as even seemingly robust methods can lead to biased results if not applied correctly. ML tools offer a promising approach to exploring the HOD parameter space with significantly reduced computational costs compared to traditional brute-force methods if their robustness is established. Using our ANN-based pipeline, we successfully recreate some standard results from recent literature. Properly restricting the HOD parameter space, transforming the training data, and carefully selecting ML algorithms are essential for achieving unbiased and robust predictions. Among the methods tested, artificial neural networks (ANNs) outperform random forests (RF) and ridge regression in predicting clustering statistics, when the HOD prior space is appropriately restricted. We demonstrate these findings using the projected two-point correlation function (wp (rp)), angular multipoles of the correlation function (ξ (r)), and the void probability function (VPF) of Luminous Red Galaxies from Dark Energy Spectroscopic Instrument mocks. Our results show that while combining wp (rp) and VPF improves parameter constraints, adding the multipoles ξ0, ξ2, and ξ4 to wp (rp) does not significantly improve the constraints.

Research Organization:
Kansas State Univ., Manhattan, KS (United States); Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE; USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
AC02-05CH11231; SC0011840; SC0021165
OSTI ID:
3010050
Journal Information:
Astronomy and Computing, Journal Name: Astronomy and Computing Vol. 49; ISSN 2213-1337
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (69)

Approximation capabilities of multilayer feedforward networks journal January 1991
Halo models of large scale structure journal December 2002
Automatic early stopping using cross validation: quantifying the criteria journal June 1998
GADGET: a code for collisionless and gasdynamical cosmological simulations journal April 2001
The illustris simulation: Public data release journal November 2015
Random Forests journal January 2001
Learning representations by back-propagating errors journal October 1986
Analytic model for galaxy and dark matter clustering journal October 2000
Halo occupation numbers and galaxy bias journal November 2000
How galaxies populate haloes in very low-density environments journal June 2020
Galaxy populations in haloes in high-density environments journal October 2021
Euclid preparation journal June 2022
How galaxies populate halos in extreme density environments: An analysis of the halo occupation distribution in SDSS journal September 2022
Cosmological hydrodynamics with adaptive mesh refinement: A new high resolution code called RAMSES journal April 2002
Bias and variance of angular correlation functions journal July 1993
Spatial Correlation Function and Pairwise Velocity Dispersion of Galaxies: Cold Dark Matter Models versus the Las Campanas Survey journal February 1998
How Many Galaxies Fit in a Halo? Constraints on Galaxy Formation Efficiency from Spatial Clustering journal January 2001
Median Statistics, H 0 , and the Accelerating Universe journal March 2001
The Halo Occupation Distribution: Toward an Empirical Determination of the Relation between Galaxies and Mass journal August 2002
Theoretical Models of the Halo Occupation Distribution: Separating Central and Satellite Galaxies journal November 2005
Galaxy Evolution from Halo Occupation Distribution Modeling of DEEP2 and SDSS Galaxy Clustering journal October 2007
emcee : The MCMC Hammer
  • Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin
  • Publications of the Astronomical Society of the Pacific, Vol. 125, Issue 925 https://doi.org/10.1086/670067
journal March 2013
The Baryon Oscillation Spectroscopic Survey of Sdss-Iii journal December 2012
Halo Occupation Distribution Modeling of Clustering of Luminous red Galaxies journal November 2009
A First look at Creating mock Catalogs with Machine Learning Techniques journal July 2013
CONNECT: a neural network based framework for emulating cosmological observables and cosmological parameter inference journal May 2023
Multitracer extension of the halo model: probing quenching and conformity in eBOSS journal July 2020
A machine learning approach to galaxy properties: joint redshift–stellar mass probability distributions with Random Forest journal January 2021
Does jackknife scale really matter for accurate large-scale structure covariances? journal June 2021
Simulating cosmic structure formation with the gadget -4 code journal July 2021
Predicting halo occupation and galaxy assembly bias with machine learning journal September 2021
The abacus cosmological N -body code journal September 2021
AbacusSummit : a massive set of high-accuracy, high-resolution N -body simulations journal September 2021
Creating jackknife and bootstrap estimates of the covariance matrix for the two-point correlation function journal May 2022
Modelling the galaxy–halo connection with machine learning journal July 2022
Dancing in the dark: galactic properties trace spin swings along the cosmic web journal August 2014
The EAGLE project: simulating the evolution and assembly of galaxies and their environments journal November 2014
The MICE Grand Challenge light-cone simulation – III. Galaxy lensing mocks from all-sky lensing maps journal December 2014
Performance of internal covariance estimators for cosmic shear correlation functions journal December 2015
Improving initial conditions for cosmological N -body simulations journal July 2016
The Horizon-AGN simulation: morphological diversity of galaxies promoted by AGN feedback journal September 2016
MultiDark simulations: the story of dark matter halo concentrations and density profiles journal February 2016
The bahamas project: calibrated hydrodynamical simulations for large-scale structure cosmology journal October 2016
Painting galaxies into dark matter haloes using machine learning journal May 2018
Machine learning cosmological structure formation journal June 2018
The impact of assembly bias on the halo occupation in hydrodynamical simulations journal August 2018
Revealing the galaxy–halo connection in IllustrisTNG journal September 2019
Multiwavelength cluster mass estimates and machine learning journal November 2019
corrfunc – a suite of blazing fast correlation functions on the CPU journal November 2019
A high-fidelity realization of the Euclid code comparison N -body simulation with Abacus journal March 2019
simba: Cosmological simulations with black hole growth and feedback journal April 2019
GreeM: Massively Parallel TreePM Code for Large Cosmological N -body Simulations journal December 2009
The cosmological simulation code gadget-2 journal December 2005
Statistical analysis of galaxy surveys - I. Robust error estimation for two-point clustering statistics journal June 2009
E pur si muove: Galilean-invariant cosmological hydrodynamical simulations on a moving mesh journal January 2010
Scaling relations for galaxy clusters in the Millennium-XXL simulation: Scaling relations for clusters in the MXXL journal October 2012
Bayesian Interpolation journal May 1992
PKDGRAV3: beyond trillion particle cosmological simulations for the next era of galaxy surveys journal May 2017
The IllustrisTNG simulations: public data release journal May 2019
Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance journal January 2005
Forward Modeling of Large-scale Structure: An Open-source Approach with Halotools journal October 2017
The Impact of Assembly Bias on the Galaxy Content of Dark Matter Halos journal January 2018
Modeling the Impact of Baryons on Subhalo Populations with Machine Learning journal June 2018
The Aemulus Project. I. Numerical Simulations for Precision Cosmology journal April 2019
A Hybrid Deep Learning Approach to Cosmological Constraints from Galaxy Redshift Surveys journal February 2020
Constraining Cosmology with Machine Learning and Galaxy Clustering: The CAMELS-SAM Suite journal August 2023
The Abacus Cosmos: A Suite of Cosmological N -body Simulations journal June 2018
The Outer Rim Simulation: A Path to Many-core Supercomputers journal November 2019
Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature journal January 2014