TY - GEN
T1 - Identification of Unnatural Subsets in Statistical Data
AU - Suzuki, Takahiko
AU - Kamimasu, Tssukasa
AU - Nakatoh, Tetsuya
AU - Hirokawa, Sachio
N1 - Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.
AB - Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.
UR - http://www.scopus.com/inward/record.url?scp=85065195514&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065195514&partnerID=8YFLogxK
U2 - 10.1109/IIAI-AAI.2018.00024
DO - 10.1109/IIAI-AAI.2018.00024
M3 - Conference contribution
AN - SCOPUS:85065195514
T3 - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
SP - 74
EP - 80
BT - Proceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
Y2 - 8 July 2018 through 13 July 2018
ER -