A dynamic-static approach of model fusion for document similarity computation

Jiyi Li, Yasuhito Asano, Toshiyuki Shimizu, Masatoshi Yoshikawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The semantic similarity of text document pairs can be used for valuable applications. There are various existing basic models proposed for representing document content and computing document similarity. Each basic model performs difference in different scenarios. Existing model selection or fusion approaches generate improved models based on these basic models on the granularity of document collection. These improved models are static for all document pairs and may be only proper for some of the document pairs. We propose a dynamic idea of model fusion, and an approach based on a Dynamic-Static Fusion Model (DSFM) on the granularity of document pairs, which is dynamic for each document pair. The dynamic module in DSFM learns to rank the basic models to predict the best basic model for a given document pair. We propose a model categorization method to construct ideal model labels of document pairs for learning in this dynamic module. The static module in DSFM is based on linear regression. We also propose a model selection method to select appropriate candidate basic models for fusion and improve the performance. The experiments on public document collections which contain paragraph pairs and sentence pairs with human-rated similarity illustrate the effectiveness of our approach.

Original languageEnglish
Title of host publicationWeb Information Systems Engineering – WISE 2015 - 16th International Conference, Proceedings
EditorsShu-Ching Chen, Tao Li, Hua Wang, Yanchun Zhang, Wojciech Cellary, Dingding Wang, Wojciech Cellary, Shu-Ching Chen, Tao Li, Dingding Wang, Jianyong Wang, Jianyong Wang, Hua Wang, Yanchun Zhang
PublisherSpringer Verlag
Pages353-368
Number of pages16
ISBN (Print)9783319261898, 9783319261898
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event16th International Conference on Web Information Systems Engineering, WISE 2015 - Miami, United States
Duration: Nov 1 2015Nov 3 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9418
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference on Web Information Systems Engineering, WISE 2015
Country/TerritoryUnited States
CityMiami
Period11/1/1511/3/15

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A dynamic-static approach of model fusion for document similarity computation'. Together they form a unique fingerprint.

Cite this