Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Explaining and predicting human behavior and social dynamics in simulated virtual worlds: reproducibility, generalizability, and robustness of causal discovery methods

Journal Article · · Computational and Mathematical Organization Theory
Ground Truth program was designed to evaluate social science modeling approaches using simulation test beds with ground truth intentionally and systematically embedded to understand and model complex Human Domain systems and their dynamics Lazer et al. (Science 369:1060–1062, 2020). Our multidisciplinary team of data scientists, statisticians, experts in Artificial Intelligence (AI) and visual analytics had a unique role on the program to investigate accuracy, reproducibility, generalizability, and robustness of the state-of-the-art (SOTA) causal structure learning approaches applied to fully observed and sampled simulated data across virtual worlds. In addition, we analyzed the feasibility of using machine learning models to predict future social behavior with and without causal knowledge explicitly embedded. In this paper, we first present our causal modeling approach to discover the causal structure of four virtual worlds produced by the simulation teams—Urban Life, Financial Governance, Disaster and Geopolitical Conflict. Our approach adapts the state-of-the-art causal discovery (including ensemble models), machine learning, data analytics, and visualization techniques to allow a human-machine team to reverse-engineer the true causal relations from sampled and fully observed data. We next present our reproducibility analysis of two research methods team’s performance using a range of causal discovery models applied to both sampled and fully observed data, and analyze their effectiveness and limitations. We further investigate the generalizability and robustness to sampling of the SOTA causal discovery approaches on additional simulated datasets with known ground truth. Our results reveal the limitations of existing causal modeling approaches when applied to large-scale, noisy, high-dimensional data with unobserved variables and unknown relationships between them. We show that the SOTA causal models explored in our experiments are not designed to take advantage from vasts amounts of data and have difficulty recovering ground truth when latent confounders are present; they do not generalize well across simulation scenarios and are not robust to sampling; they are vulnerable to data and modeling assumptions, and therefore, the results are hard to reproduce. Finally, when we outline lessons learned and provide recommendations to improve models for causal discovery and prediction of human social behavior from observational data, we highlight the importance of learning data to knowledge representations or transformations to improve causal discovery and describe the benefit of causal feature selection for predictive and prescriptive modeling.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
Defense Advanced Research Projects Agency (DARPA); USDOE Office of Science (SC)
Grant/Contract Number:
AC05-76RL01830
OSTI ID:
1969000
Report Number(s):
PNNL-SA-156946
Journal Information:
Computational and Mathematical Organization Theory, Journal Name: Computational and Mathematical Organization Theory Journal Issue: 1 Vol. 29; ISSN 1381-298X
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English

References (24)

Data-driven agent-based modeling, with application to rooftop solar adoption journal January 2016
Limitations of individual causal models, causal graphs, and ignorability assumptions, as illustrated by random confounding and design unfaithfulness journal February 2015
What is a complex system? journal June 2012
Causality book January 2009
Building machines that learn and think like people journal November 2016
The generalizability crisis journal December 2020
Inferring causation from time series in Earth system sciences journal June 2019
A manifesto for reproducible science journal January 2017
Predictability limit of partially observed systems journal November 2020
Measuring the predictability of life outcomes with a scientific mass collaboration journal March 2020
Simulations evaluating resampling methods for causal discovery: ensemble performance and calibration conference November 2019
Toward Causal Representation Learning journal May 2021
Enhancing reproducibility for computational methods journal December 2016
Prediction and explanation in social systems journal February 2017
Computational social science: Obstacles and opportunities journal August 2020
Which interventions work best in a pandemic? journal June 2020
Machine Learning and Causal Inference for Policy Evaluation conference August 2015
Exploring Limits to Prediction in Complex Social Systems conference April 2016
The seven tools of causal inference, with reflections on machine learning journal February 2019
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond journal October 2022
To Explain or to Predict? journal August 2010
Using Simpson’s Paradox to Discover Interesting Patterns in Behavioral Data journal June 2018
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries journal July 2019
Review of Causal Discovery Methods Based on Graphical Models journal June 2019

Similar Records

The Ground Truth Program: Simulations as Test Beds for Social Science Research Methods.
Journal Article · Sun Apr 17 20:00:00 EDT 2022 · Computational and Mathematical Organization Theory · OSTI ID:1894598

What can simulation test beds teach us about social science? Results of the ground truth program
Journal Article · Fri Apr 29 20:00:00 EDT 2022 · Computational and Mathematical Organization Theory · OSTI ID:1870475

Evaluation and Validation Approaches for Simulation of Social Behavior: Challenges and Opportunities
Book · Tue Apr 09 00:00:00 EDT 2019 · OSTI ID:1524248