TY  - COMP
TI  - Single Episode Policy Transfer
AB  - Reinforcement learning (RL) aims to learn optimal strategies for control problems with complex, stochastic dynamics. 

Standard RL formulations assume that transition dynamics are the same across episodes. 

However, this is not the case for many real-world environments, e.g. a disease process in which each patient is unique. 

Transfer learning in RL aims to solve this problem; however, existing methods allow for multiple trials on a test episode.

We consider the "single episode transfer" setting in which a policy is evaluated on one and only one test episode; thus, any adaptation must occur during that episode. 

Single episode policy transfer (SEPT) is a framework for finding optimal control policies in this setting. 

In SEPT, a "probe policy" initially probes the environment to gain information about how that episode is unique. 

Then, the information gained from the probe becomes an additional input to the "universal policy" that controls the remainder of the episode. 

SEPT is a general algorithm and can be used with any existing RL algorithm, including in batch learning mode.

AU  - Petersen, Brenden
AU  - Faissol, Daniel
AU  - Yang, Jiachen
DO  - https://doi.org/10.11578/dc.20200212.1
UR  - https://www.osti.gov/doecode/biblio/34258
CY  - United States
PY  - 2019
DA  - 2019-12-01
LA  - English
C1  - Research Org.: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
C2  - Sponsor Org.: USDOE National Nuclear Security Administration (NNSA)
C4  - Contract Number: AC52-07NA27344
ER  -