TY - GEN
T1 - Extracting Irregular Datasets in University Admission Statistics using Text Mining and Benford's Law
AU - Tozaki, Yusuke
AU - Suzuki, Takahiko
AU - Mine, Tsunenori
AU - Hirokawa, Sachio
PY - 2019/7
Y1 - 2019/7
N2 - It is known as Benford's law that the distribution of the first digits forms a specific shape for natural numerical datasets. Deviation from the Benford's distribution indicates the irregularity of the dataset. However, it does not tell any clue to interpret the reason of irregularity. The present paper constructs a search engine of cells that appear in tables by correlating a cell with the words in the title of row or column or in the explanation of the table. We generate an exhaustive dataset of cells for testing irregularity by enumerating the search conditions. We applied the method to the number of applicants, the number of candidates, and the number of successful applicants in each department of 565 private universities in Japan. We confirmed the effectiveness of the proposed method by extracting the characteristics of the irregular datasets.
AB - It is known as Benford's law that the distribution of the first digits forms a specific shape for natural numerical datasets. Deviation from the Benford's distribution indicates the irregularity of the dataset. However, it does not tell any clue to interpret the reason of irregularity. The present paper constructs a search engine of cells that appear in tables by correlating a cell with the words in the title of row or column or in the explanation of the table. We generate an exhaustive dataset of cells for testing irregularity by enumerating the search conditions. We applied the method to the number of applicants, the number of candidates, and the number of successful applicants in each department of 565 private universities in Japan. We confirmed the effectiveness of the proposed method by extracting the characteristics of the irregular datasets.
UR - http://www.scopus.com/inward/record.url?scp=85080906742&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080906742&partnerID=8YFLogxK
U2 - 10.1109/IIAI-AAI.2019.00207
DO - 10.1109/IIAI-AAI.2019.00207
M3 - Conference contribution
T3 - Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019
SP - 1023
EP - 1024
BT - Proceedings - 2019 8th International Congress on Advanced Applied Informatics, IIAI-AAI 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2019
Y2 - 7 July 2019 through 11 July 2019
ER -