TY - GEN
T1 - Extracting search query patterns via the pairwise coupled topic model
AU - Konishi, Takuya
AU - Ohwa, Takuya
AU - Fujita, Sumio
AU - Ikeda, Kazushi
AU - Hayashi, Kohei
N1 - Funding Information:
We thank Naomi Sasaya, Nobuyuki Shimizu, Yoshiko Takeuchi, and Shinichi Tsuzaki for providing helpful comments and organizing the crowdsourcing tasks. KI was supported by MEXT Kakenhi 15H01620. KH was supported by MEXT Kakenhi 15K16055.
Publisher Copyright:
© 2016 Copyright held by the owner/author(s).
PY - 2016/2/8
Y1 - 2016/2/8
N2 - A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query "NY restaurant" and "boston hotel" shares the common pattern "LOCATION SERVICE". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., "NY") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.
AB - A fundamental yet new challenge in information retrieval is the identification of patterns behind search queries. For example, the query "NY restaurant" and "boston hotel" shares the common pattern "LOCATION SERVICE". However, because of the diversity of real queries, existing approaches require data preprocessing by humans or specifying the target query domains, which hinders their applicability. We propose a probabilistic topic model that assumes that each term (e.g., "NY") has a topic (LOCATION). The key idea is that we consider topic co-occurrence in a query rather than a topic sequence, which significantly reduces computational cost yet enables us to acquire coherent topics without the preprocessing. Using two real query datasets, we demonstrate that the obtained topics are intelligible by humans, and are highly accurate in keyword prediction and query generation tasks.
UR - http://www.scopus.com/inward/record.url?scp=84964335710&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964335710&partnerID=8YFLogxK
U2 - 10.1145/2835776.2835794
DO - 10.1145/2835776.2835794
M3 - Conference contribution
AN - SCOPUS:84964335710
T3 - WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining
SP - 655
EP - 664
BT - WSDM 2016 - Proceedings of the 9th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 9th ACM International Conference on Web Search and Data Mining, WSDM 2016
Y2 - 22 February 2016 through 25 February 2016
ER -