語句断片特徴に基くSVMを利用する新語識別方法
語句断片特徴に基くSVMを利用する新語識別方法
カテゴリ: 部門大会
論文No: GS13-6
グループ名: 【C】平成21年電気学会電子・情報・システム部門大会講演論文集
発行日: 2009/09/03
タイトル(英語): A Method of New Word Identification by Word Fragment Features base on SVM Model
著者名: 張国棟 (徳島大学),任福継 (徳島大学)
著者名(英語): Guodong Zhang(Tokushima University),Fuji Ren(Tokushima University)
キーワード: 新語|語句断片|形態素解析|SVMモデル|単語辞書組み合わせ語辞書|new word|word fragment|word segmentation|SVM model|single word dictionarycombination word dictionary
要約(日本語): The word segmentation plays an important role in natural language understanding. However, the new word that does not exist in segmentation dictionary causes too many word fragments and ambiguity. In this paper we proposed a novel method to identify new word in Chinese natural language processing based on SVM model and the word fragment features. We studied the method new word identification method such as rule-based and statistics-based ones. To our observation, if a new word has been segmented, several word fragments generated and from their information,we can find that: ● The word fragment itself cannot be a word; ● The collocation of word fragment pairs will not appear in the existing dictionary. Therefore, we proposed a method to detect the word fragment from text through two dictionaries, single word dictionary and combination word dictionary. Then, we analyzed the word fragment's information and tagged the fragment features, trained a SVM model to identify the new word. Finally, we conducted the experiment to verify the effect of our new word identification method.
PDFファイルサイズ: 3,343 Kバイト
受取状況を読み込めませんでした
