TY - GEN
T1 - Mixing-time regularized policy gradient
AU - Morimura, Tetsuro
AU - Osogami, Takayuki
AU - Shirai, Tomoyuki
N1 - Publisher Copyright:
Copyright © 2014, Association for the Advancement of Artificial Intelligence.
PY - 2014
Y1 - 2014
N2 - Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by the use of the hitting time of the Markov chains. The hitting time gives an upper bound on the mixing time, and the proposed approach improves the learning efficiency by keeping the mixing time of the Markov chains short. In particular, we propose a method of temporal-difference learning for estimating the gradient of the hitting time. Numerical experiments show that the proposed method outperforms conventional methods of PGRL.
AB - Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by the use of the hitting time of the Markov chains. The hitting time gives an upper bound on the mixing time, and the proposed approach improves the learning efficiency by keeping the mixing time of the Markov chains short. In particular, we propose a method of temporal-difference learning for estimating the gradient of the hitting time. Numerical experiments show that the proposed method outperforms conventional methods of PGRL.
UR - http://www.scopus.com/inward/record.url?scp=84908176689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908176689&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84908176689
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 1997
EP - 2003
BT - Proceedings of the National Conference on Artificial Intelligence
PB - AI Access Foundation
T2 - 28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014
Y2 - 27 July 2014 through 31 July 2014
ER -