The role of information structures in game-theoretic multi-agent learning

Li, Tao; Zhao, Yuhan; Zhu, Quanyan

doi:10.1016/j.arcontrol.2022.03.003

Title: The role of information structures in game-theoretic multi-agent learning

Journal Article · Sat Mar 26 00:00:00 EDT 2022 · Annual Reviews in Control

DOI:https://doi.org/10.1016/j.arcontrol.2022.03.003· OSTI ID:1976878

^[1];

^[1]; Zhu, Quanyan ^[1]

New York University (NYU), NY (United States)

Multi-agent learning (MAL) studies how agents learn to behave optimally and adaptively from their experience when interacting with other agents in dynamic environments. The outcome of a MAL process is jointly determined by all agents’ decision-making. Hence, each agent needs to think strategically about others’ sequential moves, when planning future actions. The strategic interactions among agents makes MAL go beyond the direct extension of single-agent learning to multiple agents. With the strategic thinking, each agent aims to build a subjective model of others decision-making using its observations. Such modeling is directly influenced by agents’ perception during the learning process, which is called the information structure of the agent’s learning. As it determines the input to MAL processes, information structures play a significant role in the learning mechanisms of the agents. This review creates a taxonomy of MAL and establishes a unified and systematic way to understand MAL from the perspective of information structures. Here we define three fundamental components of MAL: the information structure (i.e., what the agent can observe), the belief generation (i.e., how the agent forms a belief about others based on the observations), as well as the policy generation (i.e., how the agent generates its policy based on its belief). In addition, this taxonomy enables the classification of a wide range of state-of-the-art algorithms into four categories based on the belief-generation mechanisms of the opponents, including stationary, conjectured, calibrated, and sophisticated opponents. We introduce Value of Information (VoI) as a metric to quantify the impact of different information structures on MAL. Finally, we discuss the strengths and limitations of algorithms from different categories and point to promising avenues of future research.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: The Ohio State University, Columbus, OH (United States)

Sponsoring Organization:: USDOE Office of Nuclear Energy (NE); National Science Foundation (NSF); Army Research Office (ARO)

Grant/Contract Number:: NE0008986; SES-1541164; ECCS-1847056; CNS-2027884; BCS-2122060; 20-19829; W911NF-19-1-0041

OSTI ID:: 1976878

Journal Information:: Annual Reviews in Control, Vol. 53, Issue C; ISSN 1367-5788

Publisher:: International Federation of Automatic Control - ElsevierCopyright Statement

Country of Publication:: United States

Language:: English

References (52)

Multiagent Learning: Basics, Challenges, and Prospects Tuyls, Karl; Weiss, Gerhard AI Magazine, Vol. 33, Issue 3 https://doi.org/10.1609/aimag.v33i3.2426	journal	September 2012
Sophisticated Experience-Weighted Attraction Learning and Strategic Teaching in Repeated Games Camerer, Colin F.; Ho, Teck-Hua; Chong, Juin-Kuan Journal of Economic Theory, Vol. 104, Issue 1 https://doi.org/10.1006/jeth.2002.2927	journal	May 2002
A Framework for Sequential Planning in Multi-Agent Settings Gmytrasiewicz, P. J.; Doshi, P. Journal of Artificial Intelligence Research, Vol. 24 https://doi.org/10.1613/jair.1579	journal	July 2005
Deep learning LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey Nature, Vol. 521, Issue 7553 https://doi.org/10.1038/nature14539	journal	May 2015
Evolutionary game dynamics Hofbauer, Josef; Sigmund, Karl Bulletin of the American Mathematical Society, Vol. 40, Issue 4 https://doi.org/10.1090/S0273-0979-03-00988-1	journal	July 2003
Using communication to reduce locality in distributed multiagent learning Mataric, Maja J. Journal of Experimental & Theoretical Artificial Intelligence, Vol. 10, Issue 3 https://doi.org/10.1080/095281398146806	journal	July 1998
Stochastic Approximations and Differential Inclusions Benaïm, Michel; Hofbauer, Josef; Sorin, Sylvain SIAM Journal on Control and Optimization, Vol. 44, Issue 1 https://doi.org/10.1137/S0363012904439301	journal	January 2005
Distributed Fictitious Play for Multiagent Systems in Uncertain Environments Eksin, Ceyhun; Ribeiro, Alejandro IEEE Transactions on Automatic Control, Vol. 63, Issue 4 https://doi.org/10.1109/TAC.2017.2747767	journal	April 2018
Human-level performance in 3D multiplayer games with population-based reinforcement learning Jaderberg, Max; Czarnecki, Wojciech M.; Dunning, Iain Science, Vol. 364, Issue 6443 https://doi.org/10.1126/science.aau6249	journal	May 2019
Best-response dynamics in zero-sum stochastic games Leslie, David S.; Perkins, Steven; Xu, Zibo Journal of Economic Theory, Vol. 189 https://doi.org/10.1016/j.jet.2020.105095	journal	September 2020
Evolving Dynamical Neural Networks for Adaptive Behavior Beer, Randall D.; Gallagher, John C. Adaptive Behavior, Vol. 1, Issue 1 https://doi.org/10.1177/105971239200100105	journal	June 1992
Stochastic games Solan, Eilon; Vieille, Nicolas Proceedings of the National Academy of Sciences, Vol. 112, Issue 45 https://doi.org/10.1073/pnas.1513508112	journal	November 2015
Distributed No-Regret Learning in Multiagent Systems: Challenges and Recent Developments Xu, Xiao; Zhao, Qing IEEE Signal Processing Magazine, Vol. 37, Issue 3 https://doi.org/10.1109/MSP.2020.2973963	journal	May 2020
A dynamic games approach to proactive defense strategies against Advanced Persistent Threats in cyber-physical systems Huang, Linan; Zhu, Quanyan Computers & Security, Vol. 89 https://doi.org/10.1016/j.cose.2019.101660	journal	February 2020
Individual Q-Learning in Normal Form Games Leslie, David S.; Collins, E. J. SIAM Journal on Control and Optimization, Vol. 44, Issue 2 https://doi.org/10.1137/S0363012903437976	journal	January 2005
Perspectives on multiagent learning Sandholm, Tuomas Artificial Intelligence, Vol. 171, Issue 7 https://doi.org/10.1016/j.artint.2007.02.004	journal	May 2007
An actor-critic algorithm for constrained Markov decision processes Borkar, V. S. Systems & Control Letters, Vol. 54, Issue 3 https://doi.org/10.1016/j.sysconle.2004.08.007	journal	March 2005
Convergent multiple-timescales reinforcement learning algorithms in normal form games Leslie, David S.; Collins, E. J. The Annals of Applied Probability, Vol. 13, Issue 4 https://doi.org/10.1214/aoap/1069786497	journal	November 2003
Risk-Sensitive Mean-Field Games Tembine, Hamidou; Zhu, Quanyan; Basar, Tamer IEEE Transactions on Automatic Control, Vol. 59, Issue 4 https://doi.org/10.1109/TAC.2013.2289711	journal	April 2014
A Comprehensive Survey of Multiagent Reinforcement Learning Busoniu, Lucian; Babuska, Robert; De Schutter, Bart IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 38, Issue 2 https://doi.org/10.1109/TSMCC.2007.913919	journal	March 2008
Online Learning and Online Convex Optimization Shalev-Shwartz, Shai Foundations and Trends® in Machine Learning, Vol. 4, Issue 2 https://doi.org/10.1561/2200000018	journal	January 2011
Q-learning Watkins, Christopher J. C. H.; Dayan, Peter Machine Learning, Vol. 8, Issue 3-4 https://doi.org/10.1007/BF00992698	journal	May 1992
Reinforcement Learning: A Survey Kaelbling, L. P.; Littman, M. L.; Moore, A. W. Journal of Artificial Intelligence Research, Vol. 4 https://doi.org/10.1613/jair.301	journal	January 1996
Dynamic Games With Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Ouyang, Yi; Tavafoghi, Hamidreza; Teneketzis, Demosthenis IEEE Transactions on Automatic Control, Vol. 62, Issue 1 https://doi.org/10.1109/TAC.2016.2544936	journal	January 2017
A survey and critique of multiagent deep reinforcement learning Hernandez-Leal, Pablo; Kartal, Bilal; Taylor, Matthew E. Autonomous Agents and Multi-Agent Systems, Vol. 33, Issue 6 https://doi.org/10.1007/s10458-019-09421-1	journal	October 2019
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals Brown, Noam; Sandholm, Tuomas Science, Vol. 359, Issue 6374 https://doi.org/10.1126/science.aao1733	journal	December 2017
Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games Marden, Jason R.; Young, H. Peyton; Arslan, Gürdal SIAM Journal on Control and Optimization, Vol. 48, Issue 1 https://doi.org/10.1137/070680199	journal	January 2009
${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$ Kar, Soummya; Moura, José M. F.; Poor, H. Vincent IEEE Transactions on Signal Processing, Vol. 61, Issue 7 https://doi.org/10.1109/TSP.2013.2241057	journal	April 2013
Superhuman AI for multiplayer poker Brown, Noam; Sandholm, Tuomas Science, Vol. 365, Issue 6456 https://doi.org/10.1126/science.aay2400	journal	August 2019
Learning by trial and error Young, H. Peyton Games and Economic Behavior, Vol. 65, Issue 2 https://doi.org/10.1016/j.geb.2008.02.011	journal	March 2009
On Gradient-Based Learning in Continuous Games Mazumdar, Eric; Ratliff, Lillian J.; Sastry, S. Shankar SIAM Journal on Mathematics of Data Science, Vol. 2, Issue 1 https://doi.org/10.1137/18M1231298	journal	January 2020
Planning and acting in partially observable stochastic domains Kaelbling, Leslie Pack; Littman, Michael L.; Cassandra, Anthony R. Artificial Intelligence, Vol. 101, Issue 1-2 https://doi.org/10.1016/S0004-3702(98)00023-X	journal	May 1998
On the relation between social dynamics and social learning Coussi-Korbel, Sabine; Fragaszy, Dorothy M. Animal Behaviour, Vol. 50, Issue 6 https://doi.org/10.1016/0003-3472(95)80001-8	journal	January 1995
Multi-Agent Systems: A Survey Dorri, Ali; Kanhere, Salil S.; Jurdak, Raja IEEE Access, Vol. 6 https://doi.org/10.1109/ACCESS.2018.2831228	journal	January 2018
Human-level control through deep reinforcement learning Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David Nature, Vol. 518, Issue 7540 https://doi.org/10.1038/nature14236	journal	February 2015
Cognition and Behavior in Normal-Form Games: An Experimental Study Costa-Gomes, Miguel; Crawford, Vincent P.; Broseta, Bruno Econometrica, Vol. 69, Issue 5 https://doi.org/10.1111/1468-0262.00239	journal	September 2001
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents Conitzer, Vincent; Sandholm, Tuomas Machine Learning, Vol. 67, Issue 1-2 https://doi.org/10.1007/s10994-006-0143-1	journal	September 2006
A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning Geramifard, Alborz Foundations and Trends® in Machine Learning, Vol. 6, Issue 4 https://doi.org/10.1561/2200000042	journal	January 2013
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker Moravčík, Matej; Schmid, Martin; Burch, Neil Science, Vol. 356, Issue 6337 https://doi.org/10.1126/science.aam6960	journal	March 2017
A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems Silva, Felipe Leno Da; Costa, Anna Helena Reali Journal of Artificial Intelligence Research, Vol. 64 https://doi.org/10.1613/jair.1.11396	journal	March 2019
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Sutton, Richard S.; Precup, Doina; Singh, Satinder Artificial Intelligence, Vol. 112, Issue 1-2 https://doi.org/10.1016/S0004-3702(99)00052-1	journal	August 1999
Stochastic Games Shapley, L. S. Proceedings of the National Academy of Sciences, Vol. 39, Issue 10 https://doi.org/10.1073/pnas.39.10.1095	journal	October 1953
Multiagent learning using a variable learning rate Bowling, Michael; Veloso, Manuela Artificial Intelligence, Vol. 136, Issue 2 https://doi.org/10.1016/S0004-3702(02)00121-2	journal	April 2002
Finite-time Analysis of the Multiarmed Bandit Problem Auer, Peter; Cesa-Bianchi, Nicolò; Fischer, Paul Machine Learning, Vol. 47, Issue 2/3, p. 235-256 https://doi.org/10.1023/A:1013689704352	journal	May 2002
Evolving artificial neural networks Proceedings of the IEEE, Vol. 87, Issue 9 https://doi.org/10.1109/5.784219	journal	January 1999
Incentive Compatibility and the Bargaining Problem Myerson, Roger B. Econometrica, Vol. 47, Issue 1 https://doi.org/10.2307/1912346	journal	January 1979
If multi-agent learning is the answer, what is the question? Shoham, Yoav; Powers, Rob; Grenager, Trond Artificial Intelligence, Vol. 171, Issue 7 https://doi.org/10.1016/j.artint.2006.02.006	journal	May 2007
On Best-Response Dynamics in Potential Games Swenson, Brian; Murray, Ryan; Kar, Soummya SIAM Journal on Control and Optimization, Vol. 56, Issue 4 https://doi.org/10.1137/17M1139461	journal	January 2018
Distributed Inertial Best-Response Dynamics Swenson, Brian; Eksin, Ceyhun; Kar, Soummya IEEE Transactions on Automatic Control, Vol. 63, Issue 12 https://doi.org/10.1109/TAC.2018.2817161	journal	December 2018
A Simple Model of Herd Behavior Banerjee, A. V. The Quarterly Journal of Economics, Vol. 107, Issue 3 https://doi.org/10.2307/2118364	journal	August 1992
A Cognitive Hierarchy Model of Games Camerer, C. F.; Ho, T. -H.; Chong, J. -K. The Quarterly Journal of Economics, Vol. 119, Issue 3 https://doi.org/10.1162/0033553041502225	journal	August 2004
Value-function reinforcement learning in Markov games Littman, Michael L. Cognitive Systems Research, Vol. 2, Issue 1 https://doi.org/10.1016/S1389-0417(01)00015-8	journal	April 2001

Similar Records

Reinforcement Learning for feedback-enabled cyber resilience

Journal Article · Mon Jan 31 00:00:00 EST 2022 · Annual Reviews in Control · OSTI ID:1976878

Huang, Yunhan; Huang, Linan; Zhu, Quanyan

Citizen involvement in energy decision making. [Nuclear power development]

Technical Report · Tue Mar 01 00:00:00 EST 1977 · OSTI ID:1976878

Curry, M G; Olsen, M E

Igiugig Site Visit Report

Technical Report · Wed Nov 17 00:00:00 EST 2021 · OSTI ID:1976878

Kilcher, Levi; Green, Rebecca; Hotchkiss, Elizabeth; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
multi-agent learning
information structures
reinforcement learning
belief generation
game theory
value of Information

Title: The role of information structures in game-theoretic multi-agent learning

Citation Formats

References (52)

Similar Records

Related Subjects