Symphony of experts: orchestration with adversarial insights in reinforcement learning

Voices Powered byElevenlabs logo


Matthieu Jonckheere LAAS, Chiara Mignacco LMO, CELESTE, Gilles Stoltz LMO, CELESTE


Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges. We explore this field through the concept of orchestration, where a (small) set of expert policies guides decision-making; the modeling thereof constitutes our first contribution. We then establish value-functions regret bounds for orchestration in the tabular setting by transferring regret-bound results from adversarial settings. We generalize and extend the analysis of natural policy gradient in Agarwal et al. [2021, Section 5.3] to arbitrary adversarial aggregation strategies. We also extend it to the case of estimated advantage functions, providing insights into sample complexity both in expectation and high probability. A key point of our approach lies in its arguably more transparent proofs compared to existing methods. Finally, we present simulations for a stochastic matching toy model.

Follow Us on


Add comment
Recommended SciCasts
Fair $ω$-Regular Games