(This contents of this NIPS spotlight video is similar to the post below, although the post is a bit more detailed.) Reinforcement learning agents can learn to play video games (for instance Atari games) by themselves. The original DQN algorithm and many of its successors clip the rewards they receive while learning. This helps stabilize the deep learning,…
Author: Hado van Hasselt
Learning values across many orders of magnitude
Our paper about adaptive target normalization in deep learning was accepted at NIPS 2016. A preprint can be found on arXiv.org. The abstract and a more informal summary can be found below. Update: There are now videos of the effect of the new approach on Atari. Abstract Most learning algorithms are not invariant to the scale of the function that…
Best paper at ICML: Dueling Network Architectures for Deep Reinforcement Learning
Today Ziyu Wang will present our paper on dueling network architectures for deep reinforcement learning at the international conference for machine learning (ICML) in New York. This paper received the best paper award. The elegant main idea of the paper is to separate the value of a state and the advantage value for each action in that state….
AlphaGo
My colleagues David Silver, Aja Huang, and others have just published their excellent work on Go in Nature. The system is called AlphaGo, and it combines Monte-Carlo tree search with deep neural networks, trained by supervised learning and by reinforcement learning from self-play. The landmark achievement was beating the human European champion 5-0 in tournament games….
UCL course – 2016
Together with Joseph Modayil, this year I am teaching the part on reinforcement learning of the Advanced Topics in Machine Learning course at UCL. Lectures Note that there will be two lectures about AlphaGo on March 24. We will talk about AlphaGo in the context of the whole course at the normal place and time (9:15am in Roberts 412), and in addition…
Learning to predict independent of span
Rich Sutton and I wrote a paper about how to efficiently learn predictions that can range over many time steps. The focus of the paper is on algorithms whose computational complexity does not grow with the time span of the predictions. This is important because many predictive questions have a large or even infinite span. Another contribution of the paper is that we…
Deep Reinforcement Learning with Double Q-learning
We recently published a paper on deep reinforcement learning with Double Q-learning, demonstrating that Q-learning learns overoptimistic action values when combined with deep neural networks, even on deterministic environments such as Atari video games, and that this can be remedied by using a variant of Double Q-learning. The resulting Double DQN algorithm greatly improves over the performance of the DQN algorithm. Abstract:…
Weighted importance sampling for off-policy learning with linear function approximation
The following paper was presented at NIPS 2014: A. Rupam Mahmood, Hado van Hasselt, and Richard S. Sutton (2014). “Weighted importance sampling for off-policy learning with linear function approximation.” Advances in Neural Information Processing Systems 27. Abstract: Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted…