商品情報にスキップ
1 1

未探索領域を拡大する未探索冒険型Q-learningによる準最短経路獲得

未探索領域を拡大する未探索冒険型Q-learningによる準最短経路獲得

通常価格 ¥770 JPY
通常価格 セール価格 ¥770 JPY
セール 売り切れ
税込

カテゴリ: 論文誌(論文単位)

グループ名: 【C】電子・情報・システム部門

発行日: 2018/07/01

タイトル(英語): Discovering Semi Shortest Path using Adventurous Q-learning to Expand Unknown Search Regions

著者名: 河原崎 俊之祐(神奈川大学 工学研究科 経営工学専攻),瀬古沢 照治(神奈川大学 工学部 情報システム創成学科)

著者名(英語): Shunosuke Kawarasaki (Graduate School of Engineering, Kanagawsa University), Teruji Sekozawa (Graduate School of Engineering, Kanagawsa University)

キーワード: 機械学習,強化学習,Q学習,行動選択,行動履歴  machine learning,reinforcement learning,Q-learning,action select,action history

要約(英語): Q-learning methods evaluate and update action values using information on rewards obtained. Since the Q value can not be updated until the learning succeeds and the reward is obtained, there is no index for learning, which causes a problem of requiring much time for learning. In cases, the route with no spread in the maze where the probability that learning fails is high is the semi shortest route from the start to the goal, the semi shortest route can not be learned. To learn the optimal actions and discover the semi shortest path, it is essential to experience a large number of unknown states at early stages of the learning process. To this end, in this work we propose unknown-adventure Q-learning, in which agents maintain an action history and adventurously seek out unknown states that have not yet been recorded in this history. When unknown states are present, the agent proceeds boldly and adventurously to search these states without fear of failure. Our unknown-adventure Q-learning experiences large numbers of states at early stages of the learning process, ensuring that actions may be selected in a way that avoids previous failures.This enables a massive acceleration of the learning process in which the number of episodes required to learn a path from start to goal is reduced 100-fold compared to the original Q-learning method. Moreover, our method is capable of discovering the semi shortest-length path through a maze even in cases where that path does not expand through the maze, a case in which learning failures are common and in which the semi shortest path cannot be discovered by methods that use V-filters or action-region valuations to accelerate learning by emphasizing prior knowledge.

本誌: 電気学会論文誌C(電子・情報・システム部門誌) Vol.138 No.7 (2018)特集:平成29年電子・情報・システム部門大会

本誌掲載ページ: 941-949 p

原稿種別: 論文/日本語

電子版へのリンク: https://www.jstage.jst.go.jp/article/ieejeiss/138/7/138_941/_article/-char/ja/

販売タイプ
書籍サイズ
ページ数
詳細を表示する