畳み込みニューラルネットワークを用いた文書分類における単語分散表現の次元に関する検討
畳み込みニューラルネットワークを用いた文書分類における単語分散表現の次元に関する検討
カテゴリ: 論文誌(論文単位)
グループ名: 【C】電子・情報・システム部門
発行日: 2019/09/01
タイトル(英語): A Study on Word Vector Dimensions for Sentence Classifications Using Convolutional Neural Networks
著者名: 作元 卓也(千葉工業大学大学院 情報科学研究科 情報科学専攻),山口 智(千葉工業大学 情報科学部 情報工学科)
著者名(英語): Sakumoto Takuya (Graduate School of Information and Computer Science, Chiba Institute of Technology), Yamaguchi Satoshi (Dept. of Computer Science, Chiba Institute of Technology)
キーワード: 畳み込みニューラルネットワーク,単語分散表現,word2vec,文書分類,自然言語処理 convolutional neural networks,distributed representation of words,word2vec,sentence classification,natural language processing
要約(英語): Recently, convolutional neural networks (CNNs) have achieved remarkable results on sentence classification problems. In these approaches, each word in the sentences is transformed to real number vectors (called word vectors) and the sentences as input data to the CNN are represented by the sequences of the word vectors. A dataset for training and testing for the CNN includes the large number of words, therefore the word vectors are embedded so high-dimentional space. As a result of this, the input data space of the CNN becomes very high. When the input data have high dimension, much training data are required for enough training of the CNN. It is not always possible, however, to get enough number of data for training. If the enough data cannot prepare for learning, it is desirable to decrease the dimension of input data. This paper shows the results that the smaller dimensional word vectors are applied to sentence classifications by CNNs. The results have shown that some dimensionality reduction does not effect too much to the accuracy of the sentence classifications by CNNs.
本誌: 電気学会論文誌C(電子・情報・システム部門誌) Vol.139 No.9 (2019) 特集:知能メカトロニクス分野と連携する知覚情報技術
本誌掲載ページ: 1066-1079 p
原稿種別: 論文/日本語
電子版へのリンク: https://www.jstage.jst.go.jp/article/ieejeiss/139/9/139_1066/_article/-char/ja/
受取状況を読み込めませんでした
