Paper abstract

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

Francisco S. Melo - Carnegie Mellon University, USA
Manuel C. Lopes - Instituto Superior Tecnico, Portugal

Session: Reinforcement Learning 2
Springer Link:

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, fitted natural actor-critic (FNAC), that extends the work in (Peters et al., 2005) to allow for general function approximation and data reuse. We combine the natural actor-critic architecture with a variant of fitted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the difficulties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for efficient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem.