TY - JOUR
T1 - Scalable prediction of compound-protein interactions using minwise hashing
AU - Tabei, Yasuo
AU - Yamanishi, Yoshihiro
N1 - Funding Information:
This work was supported by MEXT/JSPS KAKENHI Grant Numbers 24700140 and 25700029. This work was also supported by the Program to Disseminate Tenure Tracking System, MEXT, Japan, and Kyushu University Interdisciplinary Programs in Education and Projects in Research Development. This work was also supported by the PRESTO program of the Japan Science and Technology Agency (JST).
Publisher Copyright:
© 2013 Tabei and Yamanishi.
PY - 2013
Y1 - 2013
N2 - The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.
AB - The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.
UR - http://www.scopus.com/inward/record.url?scp=84906053301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906053301&partnerID=8YFLogxK
U2 - 10.1186/1752-0509-7-S6-S3
DO - 10.1186/1752-0509-7-S6-S3
M3 - Article
C2 - 24564870
AN - SCOPUS:84906053301
SN - 1752-0509
VL - 7
JO - BMC systems biology
JF - BMC systems biology
M1 - S3
ER -