Csaba Szepesvári

About

Csaba Szepesvari is a Canada CIFAR AI Chair, the team-lead for the "Foundations" team at DeepMind and a Professor of Computing Science at the University of Alberta. He earned his PhD in 1999 from Jozsef Attila University, in Szeged, Hungary. He has authored three books and about 200 peer-reviewed journal and conference papers. He serves as the action editor of the Journal of Machine Learning Research and Machine Learning, as well as on various program committees. Dr. Szepesvari's interest is artificial intelligence (AI) and, in particular, principled approaches to AI that use machine learning. He is the co-inventor of UCT, a widely successful Monte-Carlo tree search algorithm. UCT ignited much work in AI, such as DeepMind's AlphaGo which defeated the top Go professional Lee Sedol in a landmark game. This work on UCT won the 2016 test-of-time award at ECML/PKDD.

Talk

Planning, learning, and generalization in reinforcement learning

Level: General

Markov decision processes (MDPs) is a minimalist framework that is designed to capture the most important aspects of decision making under uncertainty, a problem of major practical interest. While the minimalist approach makes MDPs powerful and general, the lack of structure in MDPs also implies that planning and learning in MDPs is provably intractable with combinatorial (ie multidimensional) state-action spaces, a situation that is common in applications. Reinforcement learning methods introduce either value-function approximation, or policy approximation, or both to address this issue. The core idea is that using a powerful function approximation method, such as neural networks, algorithms can extrapolate far away from the data that they are exposed to, which ideally leads to increased efficiency and effectiveness. In this talk, building on recent results, I will discuss to what extent this hope can be met or is met, and will argue that the new results put the problem of choosing and designing reinforcement learning algorithms into an entirely new perspective, which I will describe in detail.