Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

Conference ·
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define and learn periodic multi-agent policies. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE National Renewable Energy Laboratory (NREL), Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Energy Efficiency and Renewable Energy (EERE)
DOE Contract Number:
AC36-08GO28308
OSTI ID:
2319191
Report Number(s):
NREL/CP-2C00-83437; MainId:84210; UUID:7df43834-a844-4067-9d35-bef0f5435ee8; MainAdminId:71345
Country of Publication:
United States
Language:
English

References (11)

A Markov decision process approach to multi-timescale scheduling and pricing in smart grids with integrated wind generation conference December 2011
Multi-agent reinforcement learning with directed exploration and selective memory reuse conference March 2021
Multi-agent deep reinforcement learning for efficient multi-timescale bidding of a hybrid power plant in day-ahead and real-time markets journal July 2022
Decentralized Multi-agent Reinforcement Learning with Multi-time Scale of Decision Epochs conference December 2020
Multitime scale markov decision processes journal June 2003
Counterfactual Multi-Agent Policy Gradients journal April 2018
Phase-functioned neural networks for character control journal July 2017
Optimal and Approximate Q-value Functions for Decentralized POMDPs journal May 2008
The Complexity of Decentralized Control of Markov Decision Processes journal November 2002
Multi-timescale voltage control for distribution system based on multi-agent deep reinforcement learning journal May 2023
Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments journal April 2018

Similar Records

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems
Conference · Tue Jun 28 00:00:00 EDT 2022 · OSTI ID:1881415

PowerNet: Multi-agent Deep Reinforcement Learning for Scalable Powergrid Control
Journal Article · Fri Jul 29 20:00:00 EDT 2022 · IEEE Transactions on Power Systems · OSTI ID:1877584

Distributed Power Allocation for 6-GHz Unlicensed Spectrum Sharing via Multi-agent Deep Reinforcement Learning
Conference · Wed Apr 05 00:00:00 EDT 2023 · OSTI ID:1975104