The Impact of Language Properties in Multilingual Datasets on Sarcasm Detection

Linshuo Yang, Daisuke Ikeda

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

抄録

Currently, people spend a lot of time on social media sites to express their opinions and emotions, making it one of the most important data sources for sentiment analysis tasks executed by machine learning. However, sarcasm can be an obstacle to tasks that seek to determine people's true intentions such as sentiment analysis. As a result, research on automatic sarcasm detection has garnered attention. In addition, with the globalization of social media, it has become crucial to have sarcasm detection models that can handle multiple languages. Although research on multilingual sarcasm detection models has become popular in recent years, there has been little examination of how the types of languages included in the training dataset affect model performance. Sarcasm is highly dependent on culture, and language represents the culture, so the differences in languages may affect the differences in sarcastic expressions. This study focused on the morphological typological differences between Arabic and Chinese, and trained the model using two datasets. One is an English-Arabic dataset, in which the languages belong to the same category. The other one is an English-Chinese dataset, in which the languages belong to different categories. Then the results were compared using two English test datasets. The experiment showed that the training results of English and Arabic were better than those of English and Chinese, indicating that the morphological typological classification of languages in the dataset affects multilingual sarcasm detection. In other words, to increase the detection effectiveness of languages belonging to a certain category, it is better to use training data of the same type. Additionally, a Multilingual BERT-LSTM model was constructed and compared to the BERT-only experiment. As a result, the LSTM structure was generally found to be effective for multilingual sarcasm detection.

本文言語英語
ホスト出版物のタイトルProceedings - 2023 14th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2023
出版社Institute of Electrical and Electronics Engineers Inc.
ページ1-6
ページ数6
ISBN(電子版)9798350324228
DOI
出版ステータス出版済み - 2023
イベント14th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2023 - Koriyama, 日本
継続期間: 7月 8 20237月 13 2023

出版物シリーズ

名前Proceedings - 2023 14th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2023

会議

会議14th IIAI International Congress on Advanced Applied Informatics, IIAI-AAI 2023
国/地域日本
CityKoriyama
Period7/8/237/13/23

!!!All Science Journal Classification (ASJC) codes

  • 人工知能
  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識
  • 情報システム
  • 情報システムおよび情報管理

フィンガープリント

「The Impact of Language Properties in Multilingual Datasets on Sarcasm Detection」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル