TY - JOUR
T1 - Automatic Text Classification of English Newswire Articles Based on Statistical Classification Techniques
AU - Zu, Guowei
AU - Ohyama, Wataru
AU - Wakabayashi, Tetsushi
AU - Kimura, Fumitaka
PY - 2004/1
Y1 - 2004/1
N2 - The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.
AB - The basic process of automatic text classification is learning a classification scheme from training examples then using it to classify unseen textual documents. It is essentially the same as graphic or character pattern recognition process. So the pattern recognition approaches can be used for automatic text categorization. In this research several statistical classification techniques each of which employs Euclidean distance, various similarity measures, linear discriminant function, projection distance, modified projection distance, SVM, nearest-neighbor, have been used for automatic text classification. The principal component analysis was used to reduce the dimensionality of the feature vector. Comparative experiments have been conducted on the Reuters-21578 test collection of English newswire articles. The results illustrate that the efficiency of modified projection distance is totally better than the other methods and the principal component analysis is suitable for reducing the dimensionality of the text features.
UR - http://www.scopus.com/inward/record.url?scp=33847744664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847744664&partnerID=8YFLogxK
U2 - 10.1541/ieejeiss.124.852
DO - 10.1541/ieejeiss.124.852
M3 - Article
AN - SCOPUS:33847744664
SN - 0385-4221
VL - 124
SP - 852
EP - 860
JO - IEEJ Transactions on Electronics, Information and Systems
JF - IEEJ Transactions on Electronics, Information and Systems
IS - 3
ER -