Single Episode Policy Transfer
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Reinforcement learning (RL) aims to learn optimal strategies for control problems with complex, stochastic dynamics. Standard RL formulations assume that transition dynamics are the same across episodes. However, this is not the case for many real-world environments, e.g. a disease process in which each patient is unique. Transfer learning in RL aims to solve this problem; however, existing methods allow for multiple trials on a test episode. We consider the "single episode transfer" setting in which a policy is evaluated on one and only one test episode; thus, any adaptation must occur during that episode. Single episode policy transfer (SEPT) is a framework for finding optimal control policies in this setting. In SEPT, a "probe policy" initially probes the environment to gain information about how that episode is unique. Then, the information gained from the probe becomes an additional input to the "universal policy" that controls the remainder of the episode. SEPT is a general algorithm and can be used with any existing RL algorithm, including in batch learning mode.
- Short Name / Acronym:
- SEPT
- Site Accession Number:
- 1009910
- Software Type:
- Scientific
- License(s):
- BSD 3-clause "New" or "Revised" License
- Research Organization:
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)Primary Award/Contract Number:AC52-07NA27344
- DOE Contract Number:
- AC52-07NA27344
- Code ID:
- 34258
- OSTI ID:
- code-34258
- Country of Origin:
- United States
Similar Records
Single Episode Policy Transfer in Reinforcement Learning
Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
A Transfer Learning Strategy for Improving the Data Efficiency of Deep Reinforcement Learning Control in Smart Buildings
Conference
·
Wed Mar 11 00:00:00 EDT 2020
·
OSTI ID:1776666
Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy
Conference
·
Tue Nov 12 23:00:00 EST 2019
·
OSTI ID:1576205
A Transfer Learning Strategy for Improving the Data Efficiency of Deep Reinforcement Learning Control in Smart Buildings
Conference
·
Wed Jan 31 23:00:00 EST 2024
·
OSTI ID:2324038