TY - GEN
T1 - A study on use of prior information for acceleration of reinforcement learning
AU - Terashima, Kento
AU - Murata, Junichi
PY - 2011
Y1 - 2011
N2 - Reinforcement learning is a method with which an agent learns appropriate response for solving problems by trial-and-error. The advantage is that reinforcement learning can be applied to unknown or uncertain problems. But instead, there is a drawback that this method needs a long time to solve the problem because of trial-and-error. If there is prior information about the environment, some of trial-and-error can be spared and the learning can take a shorter time. The prior information provided by a human designer can be wrong because of uncertainties in the problems. If the wrong prior information is used, there can be bad effects such as failure to get the optimal policy and slowing down of reinforcement learning. We propose to control use of the prior information to suppress the bad effects. The agent forgets the prior information gradually by multiplying a forgetting factor while it learns the better policy. We apply the proposed method to a couple of testbed environments and a number of types of prior information. The method shows the good results in terms of both the learning speed and the quality of obtained policies.
AB - Reinforcement learning is a method with which an agent learns appropriate response for solving problems by trial-and-error. The advantage is that reinforcement learning can be applied to unknown or uncertain problems. But instead, there is a drawback that this method needs a long time to solve the problem because of trial-and-error. If there is prior information about the environment, some of trial-and-error can be spared and the learning can take a shorter time. The prior information provided by a human designer can be wrong because of uncertainties in the problems. If the wrong prior information is used, there can be bad effects such as failure to get the optimal policy and slowing down of reinforcement learning. We propose to control use of the prior information to suppress the bad effects. The agent forgets the prior information gradually by multiplying a forgetting factor while it learns the better policy. We apply the proposed method to a couple of testbed environments and a number of types of prior information. The method shows the good results in terms of both the learning speed and the quality of obtained policies.
UR - http://www.scopus.com/inward/record.url?scp=81255125382&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=81255125382&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:81255125382
SN - 9784907764395
T3 - Proceedings of the SICE Annual Conference
SP - 537
EP - 543
BT - SICE 2011 - SICE Annual Conference 2011, Final Program and Abstracts
PB - Society of Instrument and Control Engineers (SICE)
T2 - 50th Annual Conference on Society of Instrument and Control Engineers, SICE 2011
Y2 - 13 September 2011 through 18 September 2011
ER -