TY - GEN
T1 - Document similarity computation by combining multiple representation models
AU - Li, Jiyi
AU - Shimizu, Toshiyuki
AU - Yoshikawa, Masatoshi
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/3
Y1 - 2015/8/3
N2 - Evaluating semantic similarity of text document pairs is an active research topic. Various models of document representation have been proposed. Each kind of representation model concentrates on a different kind of information from other kind of models. However, it is difficult for a single model to perform well in all scenarios because of the variety of textual documents. Leveraging these models to complement each other is possible to improve the performance. In this paper, we first make an analysis on the relations among document semantic similarity, human ratings and model performance. Based on the observations, we propose a rational solution of selecting different representation models and fusing the results of these models to compute document similarity for a given document collection. We leverage the performance and relations of different models to select proper models. Our fusion approach proposes a regression function with both nonlinear and linear factors and dynamic weights based on the similarities by various models. We report the effectiveness of our work based on a rated news document collection. The particular version of our general approach for this collection can integrate the information from both brief entity knowledge and detailed word content.
AB - Evaluating semantic similarity of text document pairs is an active research topic. Various models of document representation have been proposed. Each kind of representation model concentrates on a different kind of information from other kind of models. However, it is difficult for a single model to perform well in all scenarios because of the variety of textual documents. Leveraging these models to complement each other is possible to improve the performance. In this paper, we first make an analysis on the relations among document semantic similarity, human ratings and model performance. Based on the observations, we propose a rational solution of selecting different representation models and fusing the results of these models to compute document similarity for a given document collection. We leverage the performance and relations of different models to select proper models. Our fusion approach proposes a regression function with both nonlinear and linear factors and dynamic weights based on the similarities by various models. We report the effectiveness of our work based on a rated news document collection. The particular version of our general approach for this collection can integrate the information from both brief entity knowledge and detailed word content.
UR - http://www.scopus.com/inward/record.url?scp=84947104357&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947104357&partnerID=8YFLogxK
U2 - 10.1109/SNPD.2015.7176252
DO - 10.1109/SNPD.2015.7176252
M3 - Conference contribution
AN - SCOPUS:84947104357
T3 - 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2015 - Proceedings
BT - 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2015 - Proceedings
A2 - Saisho, Keizo
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2015
Y2 - 1 June 2015 through 3 June 2015
ER -