Instructor: Gabriel TURINICI
1/ Introduction to reinforcement learning
2/ Theoretical formalism: Markov decision processes (MDP), value function ( Belman and Hamilton- Jacobi – Bellman equations) etc.
3/ Common strategies, building from the example of « multi-armed bandit »
4/ Strategies in deep learning: Q-learning and DQN
5/ Strategies in deep learning: SARSA and variants
6/ Strategies in deep learning: Actor-Critic and variants
7/ During the course: various Python and gym/gymnasium implementations
8/ Perspectives.
Multi Armed Bandit codes (MAB) : play MAB, solve MAB , solve MAB v2., policy grad from chatGPT to correct., policy grad corrected.
Bellman iterations: code to correct here, solution code here
Gym: play Frozen Lake (v2023) (version 2022)
Q-Learning : with Frozen Lake, python version or notebook version
-play with gym/Atari-Breakout: python version or notebook version
Deep Q Learning (DQN) : Learn with gym/Atari-Breakout: notebook 2024 and its version with smaller NN and play with result
Policy gradients on Pong adapted from Karpathy, 2024 version (correct to get it working!) python or notebook
You can also load from HERE a converged version (rename as necessary) pg_pong_converged_turinici24
Notebook to use it: here (please send me yours if mean reward above 15!).
Projets : cf. Teams
Ping : Statistical Learning, M1 Math 2024+ – Gabriel Turinici