Paper abstract

A New Natural Policy Gradient by Stationary Distribution Metric

Tetsuro Morimura - Okinawa Institute of Science and Technology, Japan
Eiji Uchibe - Okinawa Institute of Science and Technology, Japan
Junichiro Yoshimoto - Okinawa Institute of Science and Technology, Japan
Kenji Doya - Okinawa Institute of Science and Technology, Japan

Session: Reinforcement Learning 2
Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_6

The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. Amari proposed the concept of ``natural gradient'' that takes the Riemannian metric of the parameter space into account. Kakade applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric. In this paper, we propose a Riemannian metric for the state-action joint distribution, which is directly linked with the average reward, and derive a new NPG named ``Natural State-action Gradient'' (NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.