TY - GEN
T1 - Mathematical Document Categorization with Structure of Mathematical Expressions
AU - Suzuki, Tokinori
AU - Fujii, Atsushi
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/25
Y1 - 2017/7/25
N2 - A mathematical document is a document subjected to mathematical communication, for example, a math paper and discussion in online Q&A community. Mathematical document categorization (MDC) is a task to classify mathematical documents to mathematical categories, e.g. probability theory and set theory. This task is an important task for supporting user search on recent wide-spreaded digital libraries and archiving services. Although Mathematical expressions (ME) in the document could bring an essential information as being in a central part of communication especially in math fields, how to utilize ME for MDC has not been matured. In this paper, we propose the classi cation method based on text combined with structures of ME, which are supposed to re ect conventions and rules specific to a category. Also, we present document collections built for evaluating the MDC systems, with investigation on categorial settings and its statistics. We demonstrate classi cation results that our proposed method outperforms existing methods with state-of-the-art ME modeling on F-measure.
AB - A mathematical document is a document subjected to mathematical communication, for example, a math paper and discussion in online Q&A community. Mathematical document categorization (MDC) is a task to classify mathematical documents to mathematical categories, e.g. probability theory and set theory. This task is an important task for supporting user search on recent wide-spreaded digital libraries and archiving services. Although Mathematical expressions (ME) in the document could bring an essential information as being in a central part of communication especially in math fields, how to utilize ME for MDC has not been matured. In this paper, we propose the classi cation method based on text combined with structures of ME, which are supposed to re ect conventions and rules specific to a category. Also, we present document collections built for evaluating the MDC systems, with investigation on categorial settings and its statistics. We demonstrate classi cation results that our proposed method outperforms existing methods with state-of-the-art ME modeling on F-measure.
UR - http://www.scopus.com/inward/record.url?scp=85027986346&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027986346&partnerID=8YFLogxK
U2 - 10.1109/JCDL.2017.7991566
DO - 10.1109/JCDL.2017.7991566
M3 - Conference contribution
AN - SCOPUS:85027986346
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
BT - 2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
Y2 - 19 June 2017 through 23 June 2017
ER -