Context-Aware Latent Dirichlet Allocation for Topic Segmentation

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

5 被引用数 (Scopus)

抄録

We propose a new generative model for topic segmentation based on Latent Dirichlet Allocation. The task is to divide a document into a sequence of topically coherent segments, while preserving long topic change-points (coherency) and keeping short topic segments from getting merged (saliency). Most of the existing models either fuse topic segments by keywords or focus on modeling word co-occurrence patterns without merging. They can hardly achieve both coherency and saliency since many words have high uncertainties in topic assignments due to their polysemous nature. To solve this problem, we introduce topic-specific co-occurrence of word pairs within contexts in modeling, to generate more coherent segments and alleviate the influence of irrelevant words on topic assignment. We also design an optimization algorithm to eliminate redundant items in the generated topic segments. Experimental results show that our proposal produces significant improvements in both topic coherence and topic segmentation.

本文言語英語
ホスト出版物のタイトルAdvances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference, PAKDD 2020, Proceedings
編集者Hady W. Lauw, Ee-Peng Lim, Raymond Chi-Wing Wong, Alexandros Ntoulas, See-Kiong Ng, Sinno Jialin Pan
出版社Springer
ページ475-486
ページ数12
ISBN(印刷版)9783030474256
DOI
出版ステータス出版済み - 2020
イベント24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020 - Singapore, シンガポール
継続期間: 5月 11 20205月 14 2020

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12084 LNAI
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

会議

会議24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020
国/地域シンガポール
CitySingapore
Period5/11/205/14/20

!!!All Science Journal Classification (ASJC) codes

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Context-Aware Latent Dirichlet Allocation for Topic Segmentation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル