Identification of Unnatural Subsets in Statistical Data

Takahiko Suzuki, Tssukasa Kamimasu, Tetsuya Nakatoh, Sachio Hirokawa

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Benford's law is an observation on the frequency distribution of first significant digits in natural numerical data. We can measure the unnaturalness of the data by evaluating estrangement of the frequency distribution of leading digits of the data in relation to the Benford's distribution. However, we cannot identify the unnatural part of the data precisely. In this study, we focus on the fact that statistical data is generally provided in tabular form. We specify a subset of the target data by using the item names of rows and columns that define each cell of the table or words appearing in the table title. By measuring the degree of divergence of the subset from Benford's distribution, we can identify unnatural subsets. We apply this method to agriculture-related data from China Statistical Yearbook and succeeded to identify unnatural subsets.

    Original languageEnglish
    Title of host publicationProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages74-80
    Number of pages7
    ISBN (Electronic)9781538674475
    DOIs
    Publication statusPublished - Jul 2 2018
    Event7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018 - Yonago, Japan
    Duration: Jul 8 2018Jul 13 2018

    Publication series

    NameProceedings - 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018

    Conference

    Conference7th International Congress on Advanced Applied Informatics, IIAI-AAI 2018
    Country/TerritoryJapan
    CityYonago
    Period7/8/187/13/18

    All Science Journal Classification (ASJC) codes

    • Computer Networks and Communications
    • Communication
    • Information Systems
    • Information Systems and Management
    • Education

    Fingerprint

    Dive into the research topics of 'Identification of Unnatural Subsets in Statistical Data'. Together they form a unique fingerprint.

    Cite this