« Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit » at ICPR 2024

This joint work with Stefana-Lucia ANITA has been presented at the at the 27th International Conference on Pattern Recognition (ICPR) 2024 held in Kolkata, India, Dec 1st through 5th 2024.

Talk materials:

Abstract : Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a L2 regularization term is present jointly with the ‘softmax’ parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution. 

« Optimal time sampling in physics-informed neural networks » at ICPR 2024

This talk has been presented at the at the 27th International Conference on Pattern Recognition (ICPR) 2024 held in Kolkata, India, Dec 1st through 5th 2024.

Talk materials:

Abtract : Physics-informed neural networks (PINN) is a extremely powerful paradigm used to solve equations encountered in scientific computing applications. An important part of the procedure is the minimization of the equation residual which includes, when the equation is time-dependent, a time sampling. It was argued in the literature that the sampling need not be uniform but should overweight initial time instants, but no rigorous explanation was provided for this choice. In the present work we take some prototypical examples and, under standard hypothesis concerning the neural network convergence, we show that the optimal time sampling follows a (truncated) exponential distribution. In particular we explain when is best to use uniform time sampling and when one should not. The findings are illustrated with numerical examples on linear equation, Burgers’ equation and the Lorenz system.

Deep Learning course, 2nd year of Master (ISF App : 2019-25, MATH : 2023-25)

Teacher: Gabriel TURINICI


Summary:
1/ Deep learning : major applications, references, culture
2/ Types: supervised, renforcement, unsupervised
3/ Neural networks: main objects: neurons, operations, loss fonction, optimization, architecture
4/ Stochastic optimization algorithms and convergence proof for SGD
5/ Gradient computation by « back-propagation »
6/ Pure Python implementation of a fully connected sequential network
7/ Convolutional networks (CNN) : filters, layers, architectures. 
8/ Keras implementation of a CNN.
9/ Techniques: regularization, hyper-parameters, particular networks, recurrent (RNN, LSTM); 
10/ Unsupervised Deep learning:  generative AI, GAN, VAE, Stable diffusion.
11/ Keras VAE implementation. “Hugginface” Stable Diffusion.
(12/ If time allows: LLM & NLP: word2vec, Glove (exemples : woman-man + king = queen)


Documents
MAIN document (theory): see your teams channel
(no distribution is authorized without WRITTEN consent from the author)
for back-propagationSGD convergence proof
Implementations
Function approximation by NN : notebook version, Python version
Results (approximation & convergence)

After 5 times more epochs
Official code reference https://doi.org/10.5281/zenodo.7220367
Pure python (no keras, no tensorflow, no Pytorch) implementation (cf. also theoretical doc):
– version « to implement » (with Dense/FC layers) (bd=iris),
– version : solution

If needed: iris dataset here
Implementation : keras/Iris

CNN example: https://www.tensorflow.org/tutorials/images/cnn

Todo : use on MNIST, try to obtain high accuracy on MNIST, CIFAR10.
VAE: latent space visualisation : CVAE – python (rename *.py) , CVAE ipynb version
Stable diffusion: working example 19/1/2024 on Google collab: version : notebook, (here python, rename *.py). ATTENTION the run takes 10 minutes (first time) then is somehow faster (just change the prompt text).

General chair of the conference FAAI24 « Foundations and applications of artificial intelligence », Iasi, October 28-30, 2024

General chair with C. Lefter and A. Zalinescu of the conference FAAI24 « Foundations and applications of artificial intelligence » Iasi Oct 28-30 2024. At the conference I also serve as tutorial presenter.

LLM and time series at the « 6th J.P. Morgan Global Machine Learning Conference », Paris, Oct 18th, 2024

Invited joint talk « Using LLMs techniques for time series prediction » with Pierre Brugiere presented at the 6th JP Morgan Global Machine Learning conference held in Paris, Oct 18th 2024

Talk materials: slides(click here) and here a link to the associated paper.

Gestion de risques et portefeuille M2 ISF 2020-2025

Responsable du cours : Gabriel Turinici

Contenu:

  • rappels du cadre classique : critère moyenne-variance, Markowitz, CAPP, MEDAF
  • théorie du portefeuille: indices, portefeuilles optimaux, beta, arbitrage, APT, facteurs
  • valuation de produits dérivés et probabilité risque neutre
  • trading de volatilité
  • assurance de portefeuille: stop-loss, options, CPPI, Buy&Hold, Constant-Mix
  • en fonction du temps: deep learning en finance, indicateurs techniques, portefeuille universel

Bibliographie

  • Z. Bodie, A. Kane A.J. Marcus « Investments » McGraw Hill 7th Edition 2008
  • J.C. Hull « Options, futures and other derivatives », Pearson Prentice Hall 2006, 6th edition
  • R.B. Litterman « Mordern investment management: an equilibrium approach », Goldman Sachs 2003
  • R. Portait, P. Poncet « Finance de marché » Dalloz 2008
  • P. Wilmott « Paul Wilmott introduces quantitative finance » John Wiley & and Sons, 2007

Documents (support de cours, autres documents, …)

NOTA BENE: Tous des documents sont soumis au droit d’auteur, et ne peuvent pas être distribués sauf accord préalable ECRIT de l’auteur.

pour théorie de gestion de portefeuille
« actions » classique
(proba historique)
livre du cours de M1 Mouvement
Brownien et évaluation d’actifs dérivés
Data python: format CSV et format PICKLEProgrammes :
exemple de programme pour télécharger les données (yfinance) ; a faire : ajouter ‘CAC 40’ à la liste, dessiner plutôt ‘adjusted close’ à la place de ‘close’
tests statistiques de normalité et plot du portefeuille optimal vs. portefeuilles au hasard: version de travail (initiale), et ici version partiellement corrigée, ici version ok 2024 (version 2022, v 2021)
optimal pour 5 actifsoptimalCAC40 30_p15optimalCAC40 30_p30backtest
Documents: rappels théorie classique produits dérivés (options)Code simulations scénarios Brownien / prix: version 2022, (version 2021), calcul Monte Carlo d’options ; autre version

Codes: prix et delta des options vanilles (Black & Scholes)
Code delta hedging: version a remplir et ici la solution (autres versions :
2022, 2021)
Trading de volatilité

ref: cf. section 6.1.2 du cours M1 (poly pdf non-distributable ici)

code trading volatilité : version 2022/23 ,version 2021-22 (version ancienne)
Résultats exécution
vol trading
Explication théorique: cf. poly ou document pdf
Code: stop loss 2024 (et ici la version P22), code CPPI a remplir, code CPPI v2
code Constant-Mix
dataC40, code Ornstein-Uhlenbeck+CM

Résultat stop-loss, résultat CPPI,
résultat constant-mix
Théorie
slides,
livre du cours de M1, sections 6.2 ;
Notes manuscrites
Vidéo Youtube sur le CPPI: partie 1/2, partie 2/2
Beta slippage: présentation.
Projet

Autres ressources pour le cours :

Interview with radio « France Culture » on the ethics of generative AI

A short interview with Celine Loozen from ‘France Culture’ radio station within a radio program concerning AI and GAFAM ethics.

Link for the full radio broadcast

Interview with Celine Loozen : here (local version if necessary here)

Reinforcement Learning, M2 ISF App, 2021-2024

Instructor: Gabriel TURINICI


1/ Introduction to reinforcement learning
2/ Theoretical formalism: Markov decision processes (MDP), value function ( Belman and Hamilton- Jacobi – Bellman equations) etc.
3/ Common strategies, building from the example of « multi-armed bandit »
4/ Strategies in deep learning: Q-learning and DQN
5/ Strategies in deep learning: SARSA and variants
6/ Strategies in deep learning: Actor-Critic and variants
7/ During the course: various Python and gym/gymnasium implementations
8/ Perspectives.


Principal document for the theoretical presentations: (no distribution autoried without WRITTEN consent from the author)

Multi Armed Bandit codes (MAB) : play MAB, solve MAB , solve MAB v2., policy grad from chatGPT to correct., policy grad corrected.

Bellman iterations: code to correct here, solution code here

Gym: play Frozen Lake (v2023) (version 2022)

Q-Learning : with Frozen Lake, python version or notebook version

-play with gym/Atari-Breakout: python version or notebook version

Deep Q Learning (DQN) : Learn with gym/Atari-Breakout: notebook 2024 and its version with smaller NN and play with result

Policy gradients on Pong adapted from Karpathy python or notebook

You can also load from HERE a converged version (rename as necessary) pg_pong_converged_turinici24

Notebook to use it: here (please send me yours if mean reward above 15!).

version 2023 : python or notebook Old version (2022): python or notebook

Projets : cf. Teams