Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Single Episode Policy Transfer

Software ·
DOI:https://doi.org/10.11578/dc.20200212.1· OSTI ID:code-34258 · Code ID:34258
 [1];  [1]
  1. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Reinforcement learning (RL) aims to learn optimal strategies for control problems with complex, stochastic dynamics. Standard RL formulations assume that transition dynamics are the same across episodes. However, this is not the case for many real-world environments, e.g. a disease process in which each patient is unique. Transfer learning in RL aims to solve this problem; however, existing methods allow for multiple trials on a test episode. We consider the "single episode transfer" setting in which a policy is evaluated on one and only one test episode; thus, any adaptation must occur during that episode. Single episode policy transfer (SEPT) is a framework for finding optimal control policies in this setting. In SEPT, a "probe policy" initially probes the environment to gain information about how that episode is unique. Then, the information gained from the probe becomes an additional input to the "universal policy" that controls the remainder of the episode. SEPT is a general algorithm and can be used with any existing RL algorithm, including in batch learning mode.
Short Name / Acronym:
SEPT
Site Accession Number:
1009910
Software Type:
Scientific
License(s):
BSD 3-clause "New" or "Revised" License
Research Organization:
Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)

Primary Award/Contract Number:
AC52-07NA27344
DOE Contract Number:
AC52-07NA27344
Code ID:
34258
OSTI ID:
code-34258
Country of Origin:
United States

Similar Records

Related Subjects