TY - GEN
T1 - Towards automatic transformation between different transcription conventions
T2 - 9th International Conference on Language Resources and Evaluation, LREC 2014
AU - Ishimoto, Yuichi
AU - Tsuchiya, Tomoyuki
AU - Koiso, Hanae
AU - Den, Yasuharu
N1 - Funding Information:
This work was supported by Grant-in-Aid for Collaborative Research Project of NINJAL “Sharing of conversation corpora that cover diverse styles and settings” and JSPS Grant-in-Aid for Scientific Research Number 25370505.
PY - 2014
Y1 - 2014
N2 - Because of the tremendous effort required for recording and transcription, large-scale spoken language corpora have been hardly developed in Japanese, with a notable exception of the Corpus of Spontaneous Japanese (CSJ). Various research groups have individually developed conversation corpora in Japanese, but these corpora are transcribed by different conventions and have few annotations in common, and some of them lack fundamental annotations, which are prerequisites for conversation research. To solve this situation by sharing existing conversation corpora that cover diverse styles and settings, we have tried to automatically transform a transcription made by one convention into that made by another convention. Using a conversation corpus transcribed in both the Conversation - Analysis-style (CA-style) and CSJ-style, we analyzed the correspondence between CA's 'intonation markers' and CSJ's 'tone labels,' and constructed a statistical model that converts tone labels into intonation markers with reference to linguistic and acoustic features of the speech. The result showed that there is considerable variance in intonation marking even between trained transcribers. The model predicted with 85% accuracy the presence of the intonation markers, and classified the types of the markers with 72% accuracy.
AB - Because of the tremendous effort required for recording and transcription, large-scale spoken language corpora have been hardly developed in Japanese, with a notable exception of the Corpus of Spontaneous Japanese (CSJ). Various research groups have individually developed conversation corpora in Japanese, but these corpora are transcribed by different conventions and have few annotations in common, and some of them lack fundamental annotations, which are prerequisites for conversation research. To solve this situation by sharing existing conversation corpora that cover diverse styles and settings, we have tried to automatically transform a transcription made by one convention into that made by another convention. Using a conversation corpus transcribed in both the Conversation - Analysis-style (CA-style) and CSJ-style, we analyzed the correspondence between CA's 'intonation markers' and CSJ's 'tone labels,' and constructed a statistical model that converts tone labels into intonation markers with reference to linguistic and acoustic features of the speech. The result showed that there is considerable variance in intonation marking even between trained transcribers. The model predicted with 85% accuracy the presence of the intonation markers, and classified the types of the markers with 72% accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85009194175&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009194175&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009194175
T3 - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
SP - 311
EP - 315
BT - Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Goggi, Sara
A2 - Declerck, Thierry
A2 - Mariani, Joseph
A2 - Maegaard, Bente
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Mazo, Helene
A2 - Piperidis, Stelios
A2 - Loftsson, Hrafn
PB - European Language Resources Association (ELRA)
Y2 - 26 May 2014 through 31 May 2014
ER -