TY - JOUR
T1 - KCF-S
T2 - KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
AU - Kotera, Masaaki
AU - Tabei, Yasuo
AU - Yamanishi, Yoshihiro
AU - Moriya, Yuki
AU - Tokimatsu, Toshiaki
AU - Kanehisa, Minoru
AU - Goto, Susumu
N1 - Funding Information:
The publication cost for this work was supported by MEXT Kakenhi 25108714. This article has been published as part of BMC Systems Biology Volume 7 Supplement 6, 2013: Selected articles from the 24th International Conference on Genome Informatics (GIW2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/ 7/S6.
Funding Information:
Computational resources were provided by the Bioinformatics Center, Institute for Chemical Research and the Super Computer Laboratory, Kyoto University. Funding from the Ministry of Education, Culture, Sports, Science and Technology of Japan, the Japan Science and Technology Agency, and the Japan Society for the Promotion of Science; MEXT/JSPS Kakenhi (25108714, 24700140, and 25700029). This work was also supported by the Program to Disseminate Tenure Tracking System, MEXT, Japan, and Kyushu University Interdisciplinary Programs in Education and Projects in Research Development.
Publisher Copyright:
© 2013 Kotera et al.
PY - 2013
Y1 - 2013
N2 - Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.
AB - Background: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. Methods: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. Results: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. Conclusions: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.
UR - http://www.scopus.com/inward/record.url?scp=84902497562&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902497562&partnerID=8YFLogxK
U2 - 10.1186/1752-0509-7-S6-S2
DO - 10.1186/1752-0509-7-S6-S2
M3 - Article
C2 - 24564846
AN - SCOPUS:84902497562
SN - 1752-0509
VL - 7
JO - BMC systems biology
JF - BMC systems biology
M1 - S2
ER -