Reinforcement Learning, M2 ISF App, 2021-2024

Instructor: Gabriel TURINICI


1/ Introduction to reinforcement learning
2/ Theoretical formalism: Markov decision processes (MDP), value function ( Belman and Hamilton- Jacobi – Bellman equations) etc.
3/ Common strategies, building from the example of « multi-armed bandit »
4/ Strategies in deep learning: Q-learning and DQN
5/ Strategies in deep learning: SARSA and variants
6/ Strategies in deep learning: Actor-Critic and variants
7/ During the course: various Python and gym/gymnasium implementations
8/ Perspectives.


Principal document for the theoretical presentations: (no distribution autoried without WRITTEN consent from the author)

Multi Armed Bandit codes (MAB) : play MAB, solve MAB , solve MAB v2., policy grad from chatGPT to correct., policy grad corrected.

Bellman iterations: code to correct here, solution code here

Gym: play Frozen Lake (v2023) (version 2022)

Q-Learning : with Frozen Lake, python version or notebook version

-play with gym/Atari-Breakout: python version or notebook version

Deep Q Learning (DQN) : Learn with gym/Atari-Breakout: notebook 2024 and its version with smaller NN and play with result

Policy gradients on Pong adapted from Karpathy python or notebook

You can also load from HERE a converged version (rename as necessary) pg_pong_converged_turinici24

Notebook to use it: here (please send me yours if mean reward above 15!).

version 2023 : python or notebook Old version (2022): python or notebook

Projets : cf. Teams



Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *