TY - JOUR
T1 - Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models
AU - Shao, Zheng
AU - Hirayama, Yuya
AU - Yamanishi, Yoshihiro
AU - Saigo, Hiroto
N1 - Publisher Copyright:
© 2015 American Chemical Society.
PY - 2015/12/28
Y1 - 2015/12/28
N2 - Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.
AB - Graph data are becoming increasingly common in machine learning and data mining, and its application field pervades to bioinformatics and cheminformatics. Accordingly, as a method to extract patterns from graph data, graph mining recently has been studied and developed rapidly. Since the number of patterns in graph data is huge, a central issue is how to efficiently collect informative patterns suitable for subsequent tasks such as classification or regression. In this paper, we consider mining discriminative subgraphs from graph data with multiple labels. The resulting task has important applications in cheminformatics, such as finding common functional groups that trigger multiple drug side effects, or identifying ligand functional groups that hit multiple targets. In computational experiments, we first verify the effectiveness of the proposed approach in synthetic data, then we apply it to drug adverse effect prediction problem. In the latter dataset, we compared the proposed method with L1-norm logistic regression in combination with the PubChem/Open Babel fingerprint, in that the proposed method showed superior performance with a much smaller number of subgraph patterns. Software is available from https://github.com/axot/GLP.
UR - http://www.scopus.com/inward/record.url?scp=84952794539&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84952794539&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.5b00376
DO - 10.1021/acs.jcim.5b00376
M3 - Article
C2 - 26549421
AN - SCOPUS:84952794539
SN - 1549-9596
VL - 55
SP - 2519
EP - 2527
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 12
ER -