Principal points analysis via p-median problem for binary data

Haruka Yamashita, Yoshinobu Kawahara

Research output: Contribution to journalArticlepeer-review

Abstract

Analysis with principal points is a useful statistical tool for summarizing large data. In this paper, we propose a subgradient-based algorithm to calculate a set of principal points for multivariate binary data by the formulating it as a p-median problem. This enables us to find a globally optimal set of principal points or an ε-optimal solution in the middle of the calculation by combining an upper bound found using the greedy method. This algorithm is an iterative procedure where each iteration can be calculated in an efficient manner. We investigate the applicability of the proposed framework with questionnaire data and arXiv co-authors data.

Original languageEnglish
Pages (from-to)1282-1297
Number of pages16
JournalJournal of Applied Statistics
Volume47
Issue number7
DOIs
Publication statusPublished - May 18 2020

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Principal points analysis via p-median problem for binary data'. Together they form a unique fingerprint.

Cite this