Single Episode Policy Transfer

Petersen, Brenden K.; Faissol, Daniel M

doi:10.11578/dc.20200212.1

Single Episode Policy Transfer

Software · Sun Dec 01 00:00:00 EST 2019

DOI:https://doi.org/10.11578/dc.20200212.1· OSTI ID:code-34258 · Code ID:34258

Petersen, Brenden K. ^[1]; Faissol, Daniel M ^[1]

Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Reinforcement learning (RL) aims to learn optimal strategies for control problems with complex, stochastic dynamics. Standard RL formulations assume that transition dynamics are the same across episodes. However, this is not the case for many real-world environments, e.g. a disease process in which each patient is unique. Transfer learning in RL aims to solve this problem; however, existing methods allow for multiple trials on a test episode. We consider the "single episode transfer" setting in which a policy is evaluated on one and only one test episode; thus, any adaptation must occur during that episode. Single episode policy transfer (SEPT) is a framework for finding optimal control policies in this setting. In SEPT, a "probe policy" initially probes the environment to gain information about how that episode is unique. Then, the information gained from the probe becomes an additional input to the "universal policy" that controls the remainder of the episode. SEPT is a general algorithm and can be used with any existing RL algorithm, including in batch learning mode.

Short Name / Acronym:: SEPT

Site Accession Number:: 1009910

Software Type:: Scientific

License(s):: BSD 3-clause "New" or "Revised" License

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Primary Award/Contract Number:

AC52-07NA27344

DOE Contract Number:: AC52-07NA27344

Code ID:: 34258

OSTI ID:: code-34258

Country of Origin:: United States

Similar Records

Single Episode Policy Transfer in Reinforcement Learning

Conference · Wed Mar 11 00:00:00 EDT 2020 · OSTI ID:1776666

Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy

Conference · Tue Nov 12 23:00:00 EST 2019 · OSTI ID:1576205

A Transfer Learning Strategy for Improving the Data Efficiency of Deep Reinforcement Learning Control in Smart Buildings

Conference · Wed Jan 31 23:00:00 EST 2024 · OSTI ID:2324038

Single Episode Policy Transfer

Citation Formats

Similar Records

Related Subjects