MT-Dreamer: Efficient Multi-Task Replay for Model-Based Deep Reinforcement Learning


Reinforcement learning agents operating in real environments may be asked to solve many tasks over their operational lifetimes. While attempting to perform one task, such an agent often collects experience relevant to many others, but such data is typically off-policy and therefore challenging to exploit; nevertheless, it should be exploited. Therefore, we first formalize the problem setting as a shared environment with multiple tasks and reward streams. Then, we provide a taxonomy of replay strategies in this setting and propose a novel approach to shared environment multitask replay, where off-policy task completions are balanced with on-policy task assignments during replay relabeling. We compare our method’s performance to alternative task relabeling strategies in a modified Crafter domain, where tasks are assigned in a random sequence until the agent dies or all the tasks are completed. Rewards and termination conditions are provided for each task simultaneously, although the agent is only evaluated on the sequence of assignments. Our results show that our novel replay strategy can exploit multiple streams of sparse reward without neglecting assigned tasks when combined with the deep model-based RL algorithm DreamerV2.

NeurIPS 2023
Shane Parr
Shane Parr
PhD student

My research interests include model-based reinforcement learning, abstractions, and theory of mind.