観察模倣に基づく内部報酬形成と自然方策勾配法による行動獲得

¥440 JPY

セール売り切れ

税込

カテゴリ: 部門大会

論文No: OS3-5

グループ名: 【C】平成21年電気学会電子・情報・システム部門大会講演論文集

発行日: 2009/09/03

タイトル(英語): Behavior acquisition based on natural policy gradient method and internal reward by observational imitation.

著者名: 西川徳宏(京都大学),谷口忠大(立命館大学),川上浩司(京都大学),片井修(京都大学)

著者名(英語): tokuhiro nishikawa(Kyoto University),tadahiro taniguchi(Ritsumeikan University),hiroshi kawakami(Kyoto University),osamu katai(Kyoto University)

要約(日本語): In this paper, we present an integrated machine learning architecture that combines imitation learning (IL) and reinforcement learning (RL). Observational imitation can be a means of enhancing acquisitions of behaviors for humans. Humans usually acquire behaviors to handle tasks through both IL and RL. Observing superiors' behavior, learners start to acquire behaviors through trial and error. Looking at this process, it is apparent that these two learning methods, IL and RL, are naturally integrated in one learning process of behavior acquisition. In the field of machine learning, trial and error methods are also used, but more repetition is usually required for the agent to acquire a behavior. However, observing behaviors demonstrated by more experienced agents or mentors, a learner can accelerate RL for behavior acquisition. Based on this idea, in this study, the combination of IL and RL is proposed as an integrated machine learning architecture. We used "natural policy gradient method" for RL, and introduced internal rewards generated by observational imitation to help the integration of imitation and reinforcement learning. The learning architecture is evaluated through experiments.

PDFファイルサイズ: 5,079 Kバイト

販売タイプ PDFダウンロード（一般価格440円/会員価格220円）

書籍サイズ A4

ページ数 6

数量

詳細を表示する

国/地域

観察模倣に基づく内部報酬形成と自然方策勾配法による行動獲得

観察模倣に基づく内部報酬形成と自然方策勾配法による行動獲得