TY - COMP TI - Single Episode Policy Transfer AB - Reinforcement learning (RL) aims to learn optimal strategies for control problems with complex, stochastic dynamics. Standard RL formulations assume that transition dynamics are the same across episodes. However, this is not the case for many real-world environments, e.g. a disease process in which each patient is unique. Transfer learning in RL aims to solve this problem; however, existing methods allow for multiple trials on a test episode. We consider the "single episode transfer" setting in which a policy is evaluated on one and only one test episode; thus, any adaptation must occur during that episode. Single episode policy transfer (SEPT) is a framework for finding optimal control policies in this setting. In SEPT, a "probe policy" initially probes the environment to gain information about how that episode is unique. Then, the information gained from the probe becomes an additional input to the "universal policy" that controls the remainder of the episode. SEPT is a general algorithm and can be used with any existing RL algorithm, including in batch learning mode. AU - Petersen, Brenden AU - Faissol, Daniel AU - Yang, Jiachen DO - https://doi.org/10.11578/dc.20200212.1 UR - https://www.osti.gov/doecode/biblio/34258 CY - United States PY - 2019 DA - 2019-12-01 LA - English C1 - Research Org.: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States) C2 - Sponsor Org.: USDOE National Nuclear Security Administration (NNSA) C4 - Contract Number: AC52-07NA27344 ER -