Backtracking Restarts for Deep Reinforcement Learning
Keywords:Deep Reinforcement Learning, Backtracking Restarts, Restart Distributions
Manipulating the starting states of a Markov Decision Process to accelerate the learning of a deep reinforcement learning agent is an idea that has been proposed in several ways in the literature. Examples include starting from random states to improve exploration, taking random walks from desired goal states, and using performance-based metrics for starting states selection policy. In this paper, we explore the idea of exploiting the RL agent's trajectories generated during training for use as starting states. The main intuition behind this proposal is to focus the training of the RL agent to overcome its current weaknesses by practicing overcoming failure states by resetting the environment to a state in its recent past. We shall call the idea of starting from a fixed (or variable) number of steps back from recent terminal or failure states `backtracking restarts'. Our empirical findings show that this modification yields tangible speedups in the learning process.