Spontaneous speech recognition taking account of characteristics of speaker-dependent occurrence of filled-pauses

Yumi Shima, Mariko Koga, Masaru Yamashita, Katsuya Yamauchi, Shoichi Matsunaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

One of the characteristics of spontaneous speech is the occurrence of many types of filled-pauses that usually hamper the speech recognition accuracy considerably. In this study, we first investigated the occurrence frequency of filled-pauses in spontaneous speech by using a large corpus. The investigation results revealed that the cumulative occurrence frequency of filled-pauses reaches 0.8 with only four specific filled-pauses on an average; these frequent filled-pauses were differed among speakers. On the basis of these results, we propose a speech recognition procedure that employs a combination of two recognition processes; the first process involves the use of a common lexicon and the second involves the use of an individual lexicon. The filled-pause entries in the individual lexicon were estimated on the basis of their occurrence frequencies; these occurrence frequencies were observed from the preparatory results of the first recognition process. The proposed procedure demonstrated a statistically significant improvement in the word accuracy (1.1% word-error reduction) and indicated that the filled-pauses that are rarely used by speakers hinder improvements in word accuracy. We also showed that the use of an individual lexicon that was configured from a combination of the N-best results and word confidence score limitations induced a significant improvement in the word accuracy (1.3% word-error reduction). Furthermore, we examined the applicability of certain methods for reducing the processing time by implementing multiple candidates and confidence score limitations. Our procedure facilitated a significant improvement in the total processing amount (41% reduction in the number of recognition segments of the first recognition process) by using the N-best results and the word confidence score limitations.

Original languageEnglish
Title of host publication20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
Pages3872-3876
Number of pages5
Publication statusPublished - Dec 1 2010
Externally publishedYes
Event20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society - Sydney, NSW, Australia
Duration: Aug 23 2010Aug 27 2010

Publication series

Name20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
Volume5

Other

Other20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
Country/TerritoryAustralia
CitySydney, NSW
Period8/23/108/27/10

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Spontaneous speech recognition taking account of characteristics of speaker-dependent occurrence of filled-pauses'. Together they form a unique fingerprint.

Cite this