Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law

Yusuke Tozaki, Takahiko Suzuki, Tsunenori Mine, Sachio Hirokawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It is known as Benford's law that the distribution of the first digits forms a specific shape for natural numerical datasets. Deviation from the Benford's distribution indicates the irregularity of the dataset. However, it does not tell any clue to interpret the reason of irregularity. The present paper constructs a search engine of cells that appear in tables by correlating a cell with the words in the title of row or column or in the explanation of the table. We generate an exhaustive dataset of cells for testing irregularity by enumerating the search conditions. We applied the method to the number of applicants, the number of candidates, and the number of successful applicants in each department of 565 private universities in Japan. We confirmed the effectiveness of the proposed method by extracting the characteristics of the irregular datasets.

Original languageEnglish
Title of host publicationProceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1023-1024
Number of pages2
ISBN (Electronic)9781728126272
DOIs
Publication statusPublished - Jul 2019
Event8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019 - Toyama, Japan
Duration: Jul 7 2019Jul 11 2019

Publication series

NameProceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019

Conference

Conference8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019
Country/TerritoryJapan
CityToyama
Period7/7/197/11/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Social Sciences (miscellaneous)

Fingerprint

Dive into the research topics of 'Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law'. Together they form a unique fingerprint.

Cite this