Reinforcement Learning, M2 ISF App, 2021-2026

Instructor: Gabriel TURINICI


1/ Introduction to reinforcement learning
2/ Theoretical formalism: Markov decision processes (MDP), value function ( Belman and Hamilton- Jacobi – Bellman equations) etc.
3/ Common strategies, building from the example of « multi-armed bandit »
4/ Strategies in deep learning: Q-learning and DQN
5/ Strategies in deep learning: SARSA and variants
6/ Strategies in deep learning: Actor-Critic and variants
7/ During the course: various Python and gym/gymnasium implementations
8/ Perspectives.


Principal document for the theoretical presentations: (no distribution autoried without WRITTEN consent from the author) (see « teams » group for updated version)

Multi Armed Bandit codes (MAB) : play MAB, solve MAB , solve MAB v2., policy grad from chatGPT to correct., policy grad corrected.

Bellman iterations: code to correct here, solution code here

Gym: play Frozen Lake (v2023) (version 2022)

Q-Learning : with Frozen Lake, python version or notebook version

-play with gym/Atari-Breakout: python version or notebook version

Deep Q Learning (DQN) : Learn with gym/Atari-Breakout: notebook 2024 and its version with smaller NN and play with result

Policy gradients on Pong adapted from Karpathy, 2024 version (correct to get it working!) python or notebook

You can also load from HERE a converged version (rename as necessary) pg_pong_converged_turinici24

Notebook to use it: here (please send me yours if mean reward above 15!).

Some links: parking human, parking AI

Projets : cf. Teams



Statistical Learning, M1 Math 2024-2026

Instructor: Gabriel TURINICI

Preamble: this course is just but an introduction, in a limited amount of time, to Statistical and Machine learning. This will prepare for the next year’s courses (some of them on my www page cf. « Deep Learning » and « Reinforcement Learning »).

 

Course outline

1/ Examples and machine learning framework

2/ Useful theoretical objects: predictors, loss functions, bias, variance

3/ K-nearest neighbors (k-NN) and the « curse of the dimensionality »

4/ Linear and logistic models in high dimension, variable selection and model regularization (ridge, lasso)

5/ Stochastic Optimization Algorithms

6/ Naive Bayesian classification

7/ Neural networks : introduction, operator, datasets, training, examples, implementations

8/ K-means clustering


Reference: 

Machine Learning Algorithms: From Classical Methods to Deep Neural Networks: Supervised, Unsupervised, and High-Dimensional


Exercices, implementations, current course textbook (no distribution autorized without WRITTEN consent from the author): see « teams » group.


Deep Learning course, 2nd year of Master (ISF App : 2019-26, MATH : 2023-26)

Teacher: Gabriel TURINICI


Summary:
1/ Deep learning : major applications, references, culture
2/ Types: supervised, renforcement, unsupervised
3/ Neural networks: main objects: neurons, operations, loss fonction, optimization, architecture
4/ Stochastic optimization algorithms and convergence proof for SGD
5/ Gradient computation by « back-propagation »
6/ Pure Python implementation of a fully connected sequential network
7/ Convolutional networks (CNN) : filters, layers, architectures. 
8/ Pytorch and Tensorflow(Keras) implementation of a CNN.
9/ Techniques: regularization, hyper-parameters, particular networks, recurrent (RNN, LSTM); 
10/ Unsupervised Deep learning:  generative AI, GAN, VAE, Stable diffusion.
11/ Keras VAE implementation. “Hugginface” Stable Diffusion.
(12/ If time allows: LLM & NLP: word2vec, Glove (exemples : woman-man + king = queen)


Documents
MAIN document (theory): see your teams channel
(no distribution is authorized without WRITTEN consent from the author)
for back-propagationSGD convergence proof
Implementations
Function approximation by NN : notebook version, Python version
Results (approximation & convergence)

After 5 times more epochs
Official code reference https://doi.org/10.5281/zenodo.7220367
Pure python (no keras, no tensorflow, no Pytorch) implementation (cf. also theoretical doc):
– version « to implement » (with Dense/FC layers) (bd=iris),
– version : solution

If needed: iris dataset here
Implementation : keras/Iris , pytorch

(tensorflow) CNN example: https://www.tensorflow.org/tutorials/images/cnn
Pytorch example CNN/MNIST : python and notebook versions.

Todo : use on MNIST, try to obtain high accuracy on MNIST, CIFAR10.
VAE: latent space visualisation : CVAE – python (rename *.py) , CVAE ipynb version
Stable diffusion:

Working example jan 2025: python version, Notebook version

Old working example 19/1/2024 on Google collab: version : notebook, (here python, rename *.py). ATTENTION the run takes 10 minutes (first time) then is somehow faster (just change the prompt text).

AI4-MED: Personalized Medicine in the Era of Artificial Intelligence

 

Took part recently at a round table on AI in medecine within the AI4-MED conference. Several subjects were touched including the concerns, the safeguards, the trust in complex situations. More detailed reproduction of the discussion will follow on another outlet.

 

Deep hedging at FAAI 2025

During the FAAI 2025 conference I presented a recent work with Pierre Brugière on . See here the paper (arxiv version)  and here the slides.

Executive summary: we introduce a deep-learning framework for hedging derivatives in markets with discrete trading and transaction costs, without assuming a specific stochastic model for the underlying asset. Unlike traditional approaches such as the Black–Scholes or Leland models, which rely on strong modeling assumptions and continuous-time approximations, the proposed method learns effective hedging strategies directly from data. A key contribution is its ability to perform well with very limited training data—using as few as 256 simulated price trajectories—while outperforming classical hedging schemes in numerical experiments under a geometric Brownian motion setting. This makes the approach both robust and practical for real-world applications where data and model certainty are limited.

Fake news sites: generative AI left unckecked

In a recent interview with Alexandre Boero from Clubic we discuss how recent technologies rendered possible a growing network of fake online media sites and journalists entirely generated by AI, designed to appear credible and manipulate audiences and advertisers, raising serious concerns about misinformation and the erosion of trust in digital content.

 

« Physics Informed Neural Networks for coupled radiation transport equations » at CM3P 2025 conference

This joint work with Laetitia LAGUZET has been presented at the 5th Computational Methods for Multi-scale, Multi-uncertainty and Multi-physics Problems Conference held in Porto, 1-4 July 2025.

Slides: HERE.

Abstract Physics-Informed Neural Networks (PINNs) are a type of neural network designed to incorporate physical laws directly into their learning process. These networks can model and predict solutions for complex physical systems, even with limited or incomplete data, often using a mathematical formulation of a state equation supplemented with other information.
Introduced by Raissi et al. (2019), PINNs find applications in fields like physics, engineering, and fluid mechanics, particularly for solving partial differential equations (PDEs) and other dynamic systems. In this contribution we explore a modification of PINNs to multi-physics numerical simulation involving radiation transport equations; these equations describe the propagation of a Marshak-type wave in a temperature dependent opaque medium and is considered a good benchmark for difficult multi-regime computations.