(第 22 期)   第十一卷第二期   2017 年 7 月 18 日出刊

基於詞性組合規則結合維基百科進行中文命名實體辨識與消歧義

本文關鍵字:
命名實體辨識
命名實體消歧義詞性組合句法規則維基百科
Named Entity Recognition
Name Entity DisambiguationPOS CombinationSyntax RulesWikipedia

本文摘要

傳統命名實體辨識多採用規則與機率的方法,然而礙於語義混淆特性與未知詞的增長,精確率難以有效提高。本研究藉由詞性組合定義命名規則,並加入姓名鏈結演算法及透過維基百科文本編輯特性,以協助辨識及消歧義。研究發現應用姓名鏈結機率公式結合句法規則,可大幅提高人名辨識精確率;對於「地名」/「組織名」,由於二者命名規則相似,過去研究需藉助詞庫及特殊詞幹集區別,本研究透過簡易地 名規則並結合維基輔助分歧。實驗結果顯示本研究在精確率、召回率、F-measure 分別達86.32%、75.33%、80.33%,相較於其他大規模規則的判斷研究,及採用人工標註結合HMM 機器學習的研究,本研究所歸納的規則不僅精簡,整體表現亦毫不遜色,尤其以精確率最為突出。

Traditional Named Entity Recognition (NER) adopts rule-based and/or probabilistic models in morphological analysis, while it still exists the problem of low accuracy due to the problem of semantic ambiguity and the growth of unknown words. In this study, we applied syntax rules of names and places to process Chinese NER, and extracted features from Wikipedia to assist disambiguation and thereby help to improve recognition accuracy. Our study found that the recognition accuracy is raised because of a combination of syntax rule with name algorithm. In addition, since the location and organization names usually follow some particular verbs, we only configured basic rules for location ER and referred to the geographical directories of Infobox in Wikipedia to verify their identities. In our overall system evaluation, the precision rate achieves 86.32%, recall reaches 75.65%, and F-measure reaches 80.4%. Compared with other automatic rule construction and quasi-machine learning methods, we have a better performance particularly on the precision rate.
本文附件:

本期同分類其他文章


本刊著作權屬於「中華民國圖書館學會」所有。
Powered By Vanilla Journal - 香草期刊系統 0.256 / 2006 - 2007 © Weizhong Yang