Classification of imbalanced documents by feature selection

Yusuke Adachi, Naoya Onimura, Takanori Yamashita, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


We previously worked on category classification problem of reuter's newspaper article using SVM and feature selection. In the study, feature selection by SVM-score [Sakai, Hirokawa, 2012] showed high accuracy. It was also expected to be superior to other standard indicators in case data is imbalanced. This study aimed to show the effectiveness of feature selection by SVM-score in machine learning with imbalanced data. For the reuter's data, F-measure was calculated in the classification experiment of all 13 categories. As a result, feature selection by SVM-score shows high f-measure and precision. In addition, we found feature words of negative example improve the classification performance.

Original languageEnglish
Title of host publicationProceedings of 2017 International Conference on Compute and Data Analysis, ICCDA 2017
PublisherAssociation for Computing Machinery
Number of pages5
ISBN (Electronic)9781450352413
Publication statusPublished - May 19 2017
Event2017 International Conference on Compute and Data Analysis, ICCDA 2017 - Lakeland, United States
Duration: May 19 2017May 23 2017

Publication series

NameACM International Conference Proceeding Series
VolumePart F130280


Other2017 International Conference on Compute and Data Analysis, ICCDA 2017
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications


Dive into the research topics of 'Classification of imbalanced documents by feature selection'. Together they form a unique fingerprint.

Cite this