Generative Model

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to …

Aymen Al Marjani, Alexandre Proutiere