TY - GEN
T1 - A Japanese particle corpus built by example-based annotation
AU - Hanaoka, Hiroki
AU - Mima, Hideki
AU - Tsujiit, Jun'ichi
N1 - Funding Information:
This work was partially supported by Grant-in-Aid for Specially Promoted Research (MEXT, Japan). This work was also sponsored by the Center for Knowledge Structuring at the University of Tokyo.
PY - 2010
Y1 - 2010
N2 - This paper is a report on an on-going project of creating a new corpus focusing on Japanese particles. The corpus will provide deeper syntactic/semantic information than the existing resources. The initial target particle is to which occurs 22, 006 times in 38, 400 sentences of the existing corpus: the Kyoto Text Corpus. In this annotation task, an "example-based" methodology is adopted for the corpus annotation, which is different from the traditional annotation style. This approach provides the annotators with an example sentence rather than a linguistic category label. By avoiding linguistic technical terms, it is expected that any native speakers, with no special knowledge on linguistic analysis, can be an annotator without long training, and hence it can reduce the annotation cost. So far, 10, 475 occurrences have been already annotated, with an inter-annotator agreement of 0.66 calculated by Cohen's kappa. The initial disagreement analyses and future directions are discussed in the paper.
AB - This paper is a report on an on-going project of creating a new corpus focusing on Japanese particles. The corpus will provide deeper syntactic/semantic information than the existing resources. The initial target particle is to which occurs 22, 006 times in 38, 400 sentences of the existing corpus: the Kyoto Text Corpus. In this annotation task, an "example-based" methodology is adopted for the corpus annotation, which is different from the traditional annotation style. This approach provides the annotators with an example sentence rather than a linguistic category label. By avoiding linguistic technical terms, it is expected that any native speakers, with no special knowledge on linguistic analysis, can be an annotator without long training, and hence it can reduce the annotation cost. So far, 10, 475 occurrences have been already annotated, with an inter-annotator agreement of 0.66 calculated by Cohen's kappa. The initial disagreement analyses and future directions are discussed in the paper.
UR - http://www.scopus.com/inward/record.url?scp=84880355035&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880355035&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84880355035
T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
SP - 1876
EP - 1880
BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
A2 - Tapias, Daniel
A2 - Russo, Irene
A2 - Hamon, Olivier
A2 - Piperidis, Stelios
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Maegaard, Bente
A2 - Odijk, Jan
A2 - Rosner, Mike
PB - European Language Resources Association (ELRA)
T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010
Y2 - 17 May 2010 through 23 May 2010
ER -