Chinese text categorization based on evolutionary hypernetwork
1.College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; 2.Chongqing Key Laboratory of Computational Intelligence, Chongqing 400065, China
Abstract:In order to improve the performance of Chinese text categorization, a Chinese text categorization method was proposed based on evolutionary hypernetwork. A Chinese Lexical Analysis System (ICTCLAS) was employed to take the words with parts of verb, noun and adjective as candidate features. The χ2test method was used to realize feature selection, and the feature weight was calculated by Boolean weighting. The preprocessed data sets were divided into training set and testing set. A hyperedge replacement strategy was used to train hypernetwork classification model for classifying testing sets. The classification performances of the hypernetwork models with different orders were analyzed and compared with traditional KNN and SVM. The experimental results show that the proposed scheme can achieve 87.2% and 72.5% of macro precision, 86.9% and 70.5% of macro recall, 87.0% and 71.5% of macro F1 for Fudan University corpus and Sohu corpus, respectively. As an efficient tool for Chinese text classification, the proposed scheme is close to or better than KNN and SVM classification methods.