This course provides an introduction to reinforcement learning, starting from the basics and ending up with advanced topics like deep RL. The course targets graduate students in control or artificial intelligence/machine learning.
The course opens with a discussion of forward-search planning methods for seeking near-optimal sequences of actions. The methods work like receding-horizon predictive control, and exploit insights from tree search in AI and bandit theory in reinforcement learning. We explain how planning can be applied and adapted to switched systems and networked control systems. The afternoon of the second day provides an introduction to the basics of RL including temporal-difference, Q-learning and SARSA methods, as well as actor-critic algorithms. The exploration-exploitation trade off will be discussed. Approaches to speed up the learning such as shaping, demonstrations and advice will be introduced. The day will be concluded by some hands on exercises.
In the second and final day of the course, algorithms are extended with function approximation techniques, in order to make them applicable to continuous-state and continuous-action control, as well as to large-scale discrete-variable problems. We begin by introducing function approximation in general, and then apply it to some basic dynamic programming and reinforcement learning techniques. We then pay attention to policy gradient techniques, as they are a natural way of using continuous actions which are essential in control. After a detour on deep learning, we discuss some recent Deep RL approaches. Extensions to multi-agent and multi-criteria RL are also briefly introduced.
•Introduction and Course Overview
•Markov Decision Processes
•Forward-Search Planning with Applications to Nonlinear Control
•Exploration Exploitation Trade-off
•Basics of Function Approximation For Continuous States and Actions
•Detour: Deep Learning, Neural Nets and Convnets
•Deep Reinforcement Learning: Value Based
•Deep Reinforcement Learning: Policy Based
•RL extensions : multi-agent RL and multi-criteria RL
•The future and open problems\
Anne Nowé is professor at the Artificial Intelligence Lab of the Vrije Universiteit Brussel, Belgium. In 2019 she is also guest-professor at the Cognitive Robotics department of the Delft University of Technology.
Lucian Busoniu is professor at the Automation Department of the Technical University of Cluj-Napoca, Romania.
This course assumes the students are familiar with the basics of Markov Decision Processes and Dynamic Programming. Students who are lacking this background can attend an introduction to these concepts on Wednesday afternoon.
The course will take place from Wednesday March 27 until Friday March 29. On Wednesday there is an optional session on Markov decision processes, their optimal solution, and basic dynamic programming techniques to find this solution: value and policy iteration (for students who do not have these techniques in their background).
The location is Delft University of Technology, the Pulse Building.
Registration fee for taking or auditing a full course is € 250. This fee is waived for DISC members. The registration form is available on the DISC course platform, or send an email to firstname.lastname@example.org.
Please register before March 7, 2019.
The maximum number of participants is 30 so don’t wait too long if you want to join.
You can obtain 1 ECTS for attending the DISC Winter Course. Please note that you have to be present at all sessions on Thursday and Friday in order to obtain the credits. If you successfully complete the take-home exercises you can earn an additional 2 ECTS, for a total of 3 ECTS.