Instructor: Gabriel TURINICI
1/ Introduction to reinforcement learning
2/ Theoretical formalism: Markov decision processes (MDP), value function ( Belman and Hamilton- Jacobi – Bellman equations) etc.
3/ Common strategies, building from the example of « multi-armed bandit »
4/ Strategies in deep learning: Q-learning and DQN
5/ Strategies in deep learning: SARSA and variants
6/ Strategies in deep learning: Actor-Critic and variants
7/ During the course: various Python and gym/gymnasium implementations
8/ Perspectives.
Multi Armed Bandit codes (MAB) : play MAB, solve MAB , solve MAB v2., policy grad from chatGPT to correct., policy grad corrected.
Bellman iterations: code to correct here, solution code here
Gym: play Frozen Lake (v2023) (version 2022)
Q-Learning : with Frozen Lake, python version or notebook version
-play with gym/Atari-Breakout: python version or notebook version
Deep Q Learning (DQN) : Learn with gym/Atari-Breakout: notebook 2024 and its version with smaller NN and play with result
Policy gradients on Pong adapted from Karpathy python or notebook
You can also load from HERE a converged version (rename as necessary) pg_pong_converged_turinici24
Notebook to use it: here (please send me yours if mean reward above 15!).
version 2023 : python or notebook Old version (2022): python or notebook
Projets : cf. Teams