商品情報にスキップ
1 1

方策に関する知識を分離した方策こう配法―環境ダイナミクスと行動価値による方策表現―

方策に関する知識を分離した方策こう配法―環境ダイナミクスと行動価値による方策表現―

通常価格 ¥770 JPY
通常価格 セール価格 ¥770 JPY
セール 売り切れ
税込

カテゴリ: 論文誌(論文単位)

グループ名: 【C】電子・情報・システム部門

発行日: 2016/03/01

タイトル(英語): Policy Gradient Reinforcement Learning with Separated Knowledge: Environmental Dynamics and Action-Values in Policies

著者名: 石原 聖司(東京電機大学理工学部),五十嵐 治一(芝浦工業大学工学部)

著者名(英語): Seiji Ishihara (School of Science and Engineering, Tokyo Denki University), Harukazu Igarashi (Faculty of Engineering, Shibaura Institute of Technology)

キーワード: 環境ダイナミクス,強化学習,行動価値,追跡問題,転移学習,方策こう配法  environmental dynamics,reinforcement learning,action-value,pursuit problem,transfer learning,policy gradient method

要約(英語): The knowledge concerning an agent's policies consists of two types: the environmental dynamics for defining state transitions around the agent, and the behavior knowledge for solving a given task. However, these two types of information, which are usually combined into state-value or action-value functions, are learned together by conventional reinforcement learning. If they are separated and learned respectively, we might be able to transfer the behavior knowledge to other environments and reuse or modify it. In our previous work, we presented appropriate rules of learning using policy gradients with an objective function, which consists of two types of parameters representing the environmental dynamics and the behavior knowledge, to separate the learning for each type. In the learning framework, state-values were used as reusable parameters corresponding to the behavior knowledge. Instead of state-values, this paper adopts action-values as parameters in the objective function of a policy and presents learning rules by the policy gradient method for each of the separated knowledge. Simulation results on a pursuit problem showed that such parameters can also be transferred and reused more effectively than the unseparated knowledge.

本誌: 電気学会論文誌C(電子・情報・システム部門誌) Vol.136 No.3 (2016) 特集:機械学習が拓くシステムイノベーション

本誌掲載ページ: 282-289 p

原稿種別: 論文/日本語

電子版へのリンク: https://www.jstage.jst.go.jp/article/ieejeiss/136/3/136_282/_article/-char/ja/

販売タイプ
書籍サイズ
ページ数
詳細を表示する