Weighted importance sampling for off-policy learning with linear function approximation

The following paper was presented at NIPS 2014: A. Rupam Mahmood, Hado van Hasselt, and Richard S. Sutton (2014). “Weighted importance sampling for off-policy learning with linear function approximation.” Advances in Neural Information Processing Systems 27. Abstract: Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, weighted…