TY - JOUR
T1 - Extensive feature detection of N-terminal protein sorting signals
AU - Bannai, Hideo
AU - Tamada, Yoshinori
AU - Maruyama, Osamu
AU - Nakai, Kenta
AU - Miyano, Satoru
N1 - Funding Information:
This research was supported in part by Grant-in-Aid for Encouragement of Young Scientists and Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ from the Ministry of Education, Sports, Science and Technology of Japan.
PY - 2002
Y1 - 2002
N2 - Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.
AB - Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005-1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminal sorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition. Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.
UR - http://www.scopus.com/inward/record.url?scp=0036187913&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036187913&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/18.2.298
DO - 10.1093/bioinformatics/18.2.298
M3 - Article
C2 - 11847077
AN - SCOPUS:0036187913
SN - 1367-4803
VL - 18
SP - 298
EP - 305
JO - Bioinformatics
JF - Bioinformatics
IS - 2
ER -