The identification of rules governing molecular recognition between drug chemical substructures and protein functional sites is a challenging issue at many stages of the drug development process. In this paper we develop a novel method to extract sets of drug chemical substructures and protein domains that govern drug-target interactions on a genome-wide scale. This is made possible using sparse canonical correspondence analysis (SCCA) for analyzing drug substructure profiles and protein domain profiles simultaneously. The method does not depend on the availability of protein 3D structures. From a data set of known drug-target interactions including enzymes, ion channels, G protein-coupled receptors, and nuclear receptors, we extract a set of chemical substructures shared by drugs able to bind to a set of protein domains. These two sets of extracted chemical substructures and protein domains form components that can be further exploited in a drug discovery process. This approach successfully clusters protein domains that may be evolutionary unrelated but that bind a common set of chemical substructures. As shown in several examples, it can also be very helpful for predicting new protein - ligand interactions and addressing the problem of ligand specificity. The proposed method constitutes a contribution to the recent field of chemogenomics that aims to connect the chemical space with the biological space. (Figure presented).
All Science Journal Classification (ASJC) codes
- General Chemistry
- General Chemical Engineering
- Computer Science Applications
- Library and Information Sciences