Pure Exploration

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with …

Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann

On the complexity of All ε-Best Arms Identification

We consider the question introduced by \cite{Mason2020} of identifying all the $\varepsilon$-optimal arms in a finite stochastic …

Aymen Al Marjani, Tomáš Kocák, Aurélien Garivier

Navigating to the Best Policy in Markov Decision Processes

We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions …

Aymen Al Marjani, Aurélien Garivier, Alexandre Proutiere

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to …

Aymen Al Marjani, Alexandre Proutiere