TY - JOUR
T1 - RF-NR
T2 - Random Forest Based Approach for Improved Classification of Nuclear Receptors
AU - Ismail, Hamid D.
AU - Saigo, Hiroto
AU - Kc, Dukka B.
N1 - Funding Information:
The authors would like to thank the developer of NucleaRDB and Dr. Xiao for providing the dataset and the web server for NRPred-FS. Dukka B. KC is partly supported by a startup grant from the Department of Computational Science and Engineering at North Carolina A&T State University, and by the National Science Foundation under Cooperative Agreement No. DBI-0939454. Hiroto Saigo is supported by JSPS KAKENHI Grant Number JP25700004.
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - The Nuclear Receptor (NR) superfamily plays an important role in key biological, developmental, and physiological processes. Developing a method for the classification of NR proteins is an important step towards understanding the structure and functions of the newly discovered NR protein. The recent studies on NR classification are either unable to achieve optimum accuracy or are not designed for all the known NR subfamilies. In this study, we developed RF-NR, which is a Random Forest based approach for improved classification of nuclear receptors. The RF-NR can predict whether a query protein sequence belongs to one of the eight NR subfamilies or it is a non-NR sequence. The RF-NR uses spectrum-like features namely: Amino Acid Composition, Di-peptide Composition, and Tripeptide Composition. Benchmarking on two independent datasets with varying sequence redundancy reduction criteria, the RF-NR achieves better (or comparable) accuracy than other existing methods. The added advantage of our approach is that we can also obtain biological insights about the important features that are required to classify NR subfamilies. RF-NR is freely available at http://bcb.ncat.edu/RF-NR/.
AB - The Nuclear Receptor (NR) superfamily plays an important role in key biological, developmental, and physiological processes. Developing a method for the classification of NR proteins is an important step towards understanding the structure and functions of the newly discovered NR protein. The recent studies on NR classification are either unable to achieve optimum accuracy or are not designed for all the known NR subfamilies. In this study, we developed RF-NR, which is a Random Forest based approach for improved classification of nuclear receptors. The RF-NR can predict whether a query protein sequence belongs to one of the eight NR subfamilies or it is a non-NR sequence. The RF-NR uses spectrum-like features namely: Amino Acid Composition, Di-peptide Composition, and Tripeptide Composition. Benchmarking on two independent datasets with varying sequence redundancy reduction criteria, the RF-NR achieves better (or comparable) accuracy than other existing methods. The added advantage of our approach is that we can also obtain biological insights about the important features that are required to classify NR subfamilies. RF-NR is freely available at http://bcb.ncat.edu/RF-NR/.
UR - http://www.scopus.com/inward/record.url?scp=85035094402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85035094402&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2017.2773063
DO - 10.1109/TCBB.2017.2773063
M3 - Article
C2 - 29990125
AN - SCOPUS:85035094402
SN - 1545-5963
VL - 15
SP - 1844
EP - 1852
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 6
M1 - 8107505
ER -