reinforcement learning – Hado van Hasselt

Atari videos

(This contents of this NIPS spotlight video is similar to the post below, although the post is a bit more detailed.) Reinforcement learning agents can learn to play video games (for instance Atari games) by themselves. The original DQN algorithm and many of its successors clip the rewards they receive while learning. This helps stabilize the deep learning,…

Deep Reinforcement Learning with Double Q-learning

We recently published a paper on deep reinforcement learning with Double Q-learning, demonstrating that Q-learning learns overoptimistic action values when combined with deep neural networks, even on deterministic environments such as Atari video games, and that this can be remedied by using a variant of Double Q-learning. The resulting Double DQN algorithm greatly improves over the performance of the DQN algorithm. Abstract:…

Weighted importance sampling for off-policy learning with linear function approximation

The following paper was presented at NIPS 2014: A. Rupam Mahmood, Hado van Hasselt, and Richard S. Sutton (2014). “Weighted importance sampling for off-policy learning with linear function approximation.” Advances in Neural Information Processing Systems 27. Abstract: Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted…