商品情報にスキップ
1 1

観察模倣に基づく内部報酬形成と自然方策勾配法による行動獲得

観察模倣に基づく内部報酬形成と自然方策勾配法による行動獲得

通常価格 ¥440 JPY
通常価格 セール価格 ¥440 JPY
セール 売り切れ
税込

カテゴリ: 部門大会

論文No: OS3-5

グループ名: 【C】平成21年電気学会電子・情報・システム部門大会講演論文集

発行日: 2009/09/03

タイトル(英語): Behavior acquisition based on natural policy gradient method and internal reward by observational imitation.

著者名: 西川 徳宏(京都大学),谷口 忠大(立命館大学),川上 浩司(京都大学),片井 修(京都大学)

著者名(英語): tokuhiro nishikawa(Kyoto University),tadahiro taniguchi(Ritsumeikan University),hiroshi kawakami(Kyoto University),osamu katai(Kyoto University)

キーワード: 強化学習|模倣学習|内部報酬|自然方策勾配法|reinforcement learning|imitation learning|internal reward|natural policy gradient method

要約(日本語): In this paper, we present an integrated machine learning architecture that combines imitation learning (IL) and reinforcement learning (RL). Observational imitation can be a means of enhancing acquisitions of behaviors for humans. Humans usually acquire behaviors to handle tasks through both IL and RL. Observing superiors' behavior, learners start to acquire behaviors through trial and error. Looking at this process, it is apparent that these two learning methods, IL and RL, are naturally integrated in one learning process of behavior acquisition. In the field of machine learning, trial and error methods are also used, but more repetition is usually required for the agent to acquire a behavior. However, observing behaviors demonstrated by more experienced agents or mentors, a learner can accelerate RL for behavior acquisition. Based on this idea, in this study, the combination of IL and RL is proposed as an integrated machine learning architecture. We used "natural policy gradient method" for RL, and introduced internal rewards generated by observational imitation to help the integration of imitation and reinforcement learning. The learning architecture is evaluated through experiments.

PDFファイルサイズ: 5,079 Kバイト

販売タイプ
書籍サイズ
ページ数
詳細を表示する