(第 25 期)   第十三卷第一期   2018 年 12 月 31 日出刊

使用多數決策略之圖書自動分類的研究

The Study of Automatic Book Classification Using Majority Vote Strategy

本文關鍵字:圖書自動分類多數決策略階層式分類器automatic book classicationmajority vote strategylayer styleclassier

本文摘要

大多數的圖書館館員多半只懂得圖書資訊學領域的知識,卻必須負責所有到館圖書的分類工作。因此常常有因學科背景知識不足造成分類困難的情形。再加上,近年隨著科技的進步圖書出版量大幅度的增加,造成編目館員越來越重的負擔而使得無法提高分類速度,且容易受到主觀認知差異的影響,導致 inter-consistency 和 intra-consistency 等一致性低落的編目品質問題。本論文探討傳統圖書自動分類的課題並結合多種分類器的優點,提出使用多數決策略之多層次圖書自動分類。為了探討此研究的效能,首先使用大學碩博士論文及其對應之圖書分類號為訓練與測試語料。針對其文件內容,研究各種內容組合對文件表徵擷取的影響後,找出應用於圖書自動分類之最佳的內容組合,例如,摘要與目錄等。同時,針對各種分類器的組合,利用分類器間具有互補的特性探討應用於本論文的最佳分類器與階層之組合。針對小量論文或少量多種類網路書店的語料得到令人滿意的實驗結果。進一步,使用大量少種類網路書店之書目資料執行自動分類。使用 10 折交叉驗證的方法驗證其效率,實驗結果顯示使用本論文提出之使用多數決策略之多層次圖書自動分類比傳統圖書分類具有更佳的分類效能。

Most  librarians  understand  the  knowledge  of  the  library information sciences and a few other academic fields, but they are responsible for the bibliography section of all the academic fields. Due to a lack of background knowledge, the bibliography becomes more and more difficult for the librarians. Moreover, thanks to the recent rapid improvement of technology, the amount of publication in every academic field increases very quickly, and the bibliography load further increases. The quality of the bibliography, such as high inter-consistency and high intra consistency of library classification, is not easy to be maintained. Thus, this paper dealt with issues of traditional automatic book classification and employed the complementary attribute of various classifiers to propose a multiple layered automatic book classification using majority voting strategy. First, the collection of theses from a university library was utilized as the training and testing corpus. The classification codes of those theses were employed as the gold standard as well. Each thesis contained various components such as a title, author, table of contents, abstract or cited papers et al. To understand the classification effect of all the combinations of these components, various combinations were studied and the best combination (i.e., the combination of abstracts and a table of content) were recommended. On the other hand, to obtain the best classification performance, the layer allocation of classifiers was also studied and the best combination was recommended. Moreover, the thesis classification results were promising. Furthermore, to conduct the large amount and multiple categories automatic book classification experiment,  the  book  content  pages  from  online  bookstore were collected. Under the principal of 10-fold cross-validation, experimental results showed that the performance of the proposed automatic book classification utperformed the traditional automatic book classification as well.
本文附件:

本期同分類其他文章


本刊著作權屬於「中華民國圖書館學會」所有。
Powered By Vanilla Journal - 香草期刊系統 0.256 / 2006 - 2007 © Weizhong Yang