December 2015 – Hado van Hasselt

Deep Reinforcement Learning with Double Q-learning

We recently published a paper on deep reinforcement learning with Double Q-learning, demonstrating that Q-learning learns overoptimistic action values when combined with deep neural networks, even on deterministic environments such as Atari video games, and that this can be remedied by using a variant of Double Q-learning. The resulting Double DQN algorithm greatly improves over the performance of the DQN algorithm. Abstract:…

Weighted importance sampling for off-policy learning with linear function approximation

The following paper was presented at NIPS 2014: A. Rupam Mahmood, Hado van Hasselt, and Richard S. Sutton (2014). “Weighted importance sampling for off-policy learning with linear function approximation.” Advances in Neural Information Processing Systems 27. Abstract: Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted…