ニューロファジィ型強化学習システムを用いた群行動の獲得
ニューロファジィ型強化学習システムを用いた群行動の獲得
カテゴリ: 論文誌(論文単位)
グループ名: 【C】電子・情報・システム部門
発行日: 2013/05/01
タイトル(英語): Adaptive Swarm Behavior Acquisition Using a Neuro-Fuzzy Reinforcement Learning System
著者名: 呉本 尭(山口大学大学院理工学研究科),山野 祐樹(山口大学大学院理工学研究科),馮 良炳(山口大学大学院理工学研究科),小林 邦和(愛知県立大学情報科学部),大林 正直(山口大学大学院理工学研究科)
著者名(英語): Takashi Kuremoto (Graduate School of Science and Engineering, Yamaguchi University), Yuki Yamano (Graduate School of Science and Engineering, Yamaguchi University), Liang-Bing Feng (Graduate School of Science and Engineering, Yamaguchi University), Kunikazu Kobayashi (School of Information Science and Technology, Aichi Prefectural University), Masanao Obayashi (Graduate School of Science and Engineering, Yamaguchi University)
キーワード: 強化学習,知的エージェント,群行動,ニューロファジィネットワーク,sarsa学習アルゴリズム reinforcement learning,intelligent agent,swarm behavior,neuro-fuzzy network,sarsa learning algorithm
要約(英語): Individuals in the swarm intelligence systems are generally designed to be able to perform cooperative behaviors. However, those individual are usually with simple structures, i.e., there are few models of individuals with high cognitive functions, e.g., pattern recognition, adaptive learning, self-organizing and so on. In this paper, we propose a neuro-fuzzy reinforcement learning system as a common internal model of the intelligent individuals, i.e., the intelligent agents or multiple autonomous mobile robots. In the proposed model, the local environment information observed by a learner is recognized by a self-structuring neuro-fuzzy network (Fuzzy Net), and a conventional reinforcement learning algorithm named "sarsa" is adopted into the system for modifying the connections between the part of Fuzzy Net and state-action value functions to acquire adaptive behaviors. Swarm of agent is also available to be formed by the proposed method adopting reward/punishment during the learning process. According to the results of simulations of dealing with goal-navigation exploration problems, "swarm learning" i.e., suitable distances between individuals are evaluated with positive rewards during the learning process, showed higher efficiency compared with the opposite case of "individual learning".
本誌: 電気学会論文誌C(電子・情報・システム部門誌) Vol.133 No.5 (2013) 特集:新たな産業への応用が進む無線通信技術
本誌掲載ページ: 1076-1085 p
原稿種別: 論文/日本語
電子版へのリンク: https://www.jstage.jst.go.jp/article/ieejeiss/133/5/133_1076/_article/-char/ja/
受取状況を読み込めませんでした
