« Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit » at ICPR 2024

This joint work with Stefana-Lucia ANITA has been presented at the at the 27th International Conference on Pattern Recognition (ICPR) 2024 held in Kolkata, India, Dec 1st through 5th 2024.

Talk materials:

Abstract : Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of the policy gradient algorithm used for MAB have not been given enough attention. We investigate in this work the convergence of such a procedure for the situation when a L2 regularization term is present jointly with the ‘softmax’ parametrization. We prove convergence under appropriate technical hypotheses and test numerically the procedure including situations beyond the theoretical setting. The tests show that a time dependent regularized procedure can improve over the canonical approach especially when the initial guess is far from the solution. 

« Optimal time sampling in physics-informed neural networks » at ICPR 2024

This talk has been presented at the at the 27th International Conference on Pattern Recognition (ICPR) 2024 held in Kolkata, India, Dec 1st through 5th 2024.

Talk materials:

Abtract : Physics-informed neural networks (PINN) is a extremely powerful paradigm used to solve equations encountered in scientific computing applications. An important part of the procedure is the minimization of the equation residual which includes, when the equation is time-dependent, a time sampling. It was argued in the literature that the sampling need not be uniform but should overweight initial time instants, but no rigorous explanation was provided for this choice. In the present work we take some prototypical examples and, under standard hypothesis concerning the neural network convergence, we show that the optimal time sampling follows a (truncated) exponential distribution. In particular we explain when is best to use uniform time sampling and when one should not. The findings are illustrated with numerical examples on linear equation, Burgers’ equation and the Lorenz system.

General chair of the conference FAAI24 « Foundations and applications of artificial intelligence », Iasi, October 28-30, 2024

General chair with C. Lefter and A. Zalinescu of the conference FAAI24 « Foundations and applications of artificial intelligence » Iasi Oct 28-30 2024. At the conference I also serve as tutorial presenter.

LLM and time series at the « 6th J.P. Morgan Global Machine Learning Conference », Paris, Oct 18th, 2024

Invited joint talk « Using LLMs techniques for time series prediction » with Pierre Brugiere presented at the 6th JP Morgan Global Machine Learning conference held in Paris, Oct 18th 2024

Talk materials: slides(click here) and here a link to the associated paper.

Interview with radio « France Culture » on the ethics of generative AI

A short interview with Celine Loozen from ‘France Culture’ radio station within a radio program concerning AI and GAFAM ethics.

Link for the full radio broadcast

Interview with Celine Loozen : here (local version if necessary here)

« Reinforcement learning in finance: online portfolio allocation and policy gradient approaches, the Onflow algorithm », NANMATH nov 2023

This is a talk presented at Nanmath conference held Nov 6-9 2023 at ICTP, Cluj..

Talk materials (click to open or download): the Slides of the presentation, the ArXiv preprint and the Youtube VIDEO.

« Reinforcement learning in finance: portfolio allocation, value functions and policy gradients flows », ACDSDE conference sept 2023

This is a talk presented at ACDSDE conference held Sept 28-30 2023 at the Romanian Academy (Iasi station), Octav Mayer Institute of mathematics.

Talk materials: the slides of the presentation.