TY - GEN
T1 - Unsupervised Feature Value Selection Based on Explainability
AU - Shin, Kilho
AU - Okumoto, Kenta
AU - Shepard, David Lawrence
AU - Kusaba, Akira
AU - Hashimoto, Takako
AU - Amari, Jorge
AU - Murota, Keisuke
AU - Takai, Junnosuke
AU - Kuboyama, Tetsuji
AU - Ohshima, Hiroaki
N1 - Funding Information:
Acknowledgements. This work was partially supported by the Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Numbers 16K12491 and 17H00762) from the Japan Society for the Promotion of Science.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - The problem of feature selection has been an area of considerable research in machine learning. Feature selection is known to be particularly difficult in unsupervised learning because different subgroups of features can yield useful insights into the same dataset. In other words, many theoretically-right answers may exist for the same problem. Furthermore, designing algorithms for unsupervised feature selection is technically harder than designing algorithms for supervised feature selection because unsupervised feature selection algorithms cannot be guided by class labels. As a result, previous work attempts to discover intrinsic structures of data with heavy computation such as matrix decomposition, and require significant time to find even a single solution. This paper proposes a novel algorithm, named Explainability-based Unsupervised Feature Value Selection (EUFVS), which enables a paradigm shift in feature selection, and solves all of these problems. EUFVS requires only a few tens of milliseconds for datasets with thousands of features and instances, allowing the generation of a large number of possible solutions and select the solution with the best fit. Another important advantage of EUFVS is that it selects feature values instead of features, which can better explain phenomena in data than features. EUFVS enables a paradigm shift in feature selection. This paper explains its theoretical advantage, and also shows its applications in real experiments. In our experiments with labeled datasets, EUFVS found feature value sets that explain labels, and also detected useful relationships between feature value sets not detectable from given class labels.
AB - The problem of feature selection has been an area of considerable research in machine learning. Feature selection is known to be particularly difficult in unsupervised learning because different subgroups of features can yield useful insights into the same dataset. In other words, many theoretically-right answers may exist for the same problem. Furthermore, designing algorithms for unsupervised feature selection is technically harder than designing algorithms for supervised feature selection because unsupervised feature selection algorithms cannot be guided by class labels. As a result, previous work attempts to discover intrinsic structures of data with heavy computation such as matrix decomposition, and require significant time to find even a single solution. This paper proposes a novel algorithm, named Explainability-based Unsupervised Feature Value Selection (EUFVS), which enables a paradigm shift in feature selection, and solves all of these problems. EUFVS requires only a few tens of milliseconds for datasets with thousands of features and instances, allowing the generation of a large number of possible solutions and select the solution with the best fit. Another important advantage of EUFVS is that it selects feature values instead of features, which can better explain phenomena in data than features. EUFVS enables a paradigm shift in feature selection. This paper explains its theoretical advantage, and also shows its applications in real experiments. In our experiments with labeled datasets, EUFVS found feature value sets that explain labels, and also detected useful relationships between feature value sets not detectable from given class labels.
UR - http://www.scopus.com/inward/record.url?scp=85103477615&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103477615&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-71158-0_20
DO - 10.1007/978-3-030-71158-0_20
M3 - Conference contribution
AN - SCOPUS:85103477615
SN - 9783030711573
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 421
EP - 444
BT - Agents and Artificial Intelligence - 12th International Conference, ICAART 2020, Revised Selected Papers
A2 - Rocha, Ana Paula
A2 - Steels, Luc
A2 - van den Herik, Jaap
PB - Springer Science and Business Media Deutschland GmbH
T2 - 12th International Conference on Agents and Artificial Intelligence, ICAART 2020
Y2 - 22 February 2020 through 24 February 2020
ER -