学習初期のパラメータの急激な変化を抑制するための学習率の範囲を制御するAdam
学習初期のパラメータの急激な変化を抑制するための学習率の範囲を制御するAdam
カテゴリ: 論文誌(論文単位)
グループ名: 【C】電子・情報・システム部門
発行日: 2022/10/01
タイトル(英語): A Method of Learning Rate Range Control for Adam to Suppress Sudden Changes of Parameters in Early Learning Stage
著者名: 行木 大輝(千葉工業大学大学院 情報科学研究科 情報科学専攻),山口 智(千葉工業大学 情報科学部 情報工学科)
著者名(英語): Daiki Nameki (Graduate School of Information and Computer Science, Chiba Institute of Technology), Satoshi Yamaguchi (Department of Computer Science, Chiba Institute of Technology)
キーワード: ニューラルネットワーク,最適化アルゴリズム,Adam,AdaBound,RAdam,WarmUp_x000D_ neural networks,optimization algorithms,Adam,AdaBound,RAdam,WarmUp
要約(英語): Adam is one of the general optimization algorithms in neural networks. It can accelerate convergence speed while learning. It has, however, two problems. The first is that the final performance of a network trained by Adam, such as generalization ability, becomes worse than the one trained by SGD, in applications to large-scale networks. The second is that values of the learning rate tend to be large at the early learning stage; as a result, the values of network parameters, such as weights and bias, become too large by a first few iterations. In recent years, research has been conducted to solve these problems. AdaBound has been proposed for solving the first problem. This is a method switching dynamically from Adam to SGD. RAdam has also been proposed, for solving the second problem. This applies a method called WarmUp, which sets a small learning rate at the early learning stage and gradually increases it, to Adam. In this study, we propose to apply WarmUp to the upper limit of AdaBound's learning rate. The proposed algorithm prevents parameter updates at extremely large learning rates in the early learning stages. Therefore, more efficient learning can be expected than the conventional method. The proposed method has been applied to learning of some types of networks like CNN, ResNet, DenseNet and BERT. The results show that our method has improved performance compared to the traditional method, and an image classification task has shown a tendency to be more effective in large networks.
本誌: 電気学会論文誌C(電子・情報・システム部門誌) Vol.142 No.10 (2022) 特集:電子材料関連技術の最近の進展
本誌掲載ページ: 1156-1165 p
原稿種別: 論文/日本語
電子版へのリンク: https://www.jstage.jst.go.jp/article/ieejeiss/142/10/142_1156/_article/-char/ja/
受取状況を読み込めませんでした
