TY - GEN
T1 - Spontaneous speech recognition taking account of characteristics of speaker-dependent occurrence of filled-pauses
AU - Shima, Yumi
AU - Koga, Mariko
AU - Yamashita, Masaru
AU - Yamauchi, Katsuya
AU - Matsunaga, Shoichi
PY - 2010/12/1
Y1 - 2010/12/1
N2 - One of the characteristics of spontaneous speech is the occurrence of many types of filled-pauses that usually hamper the speech recognition accuracy considerably. In this study, we first investigated the occurrence frequency of filled-pauses in spontaneous speech by using a large corpus. The investigation results revealed that the cumulative occurrence frequency of filled-pauses reaches 0.8 with only four specific filled-pauses on an average; these frequent filled-pauses were differed among speakers. On the basis of these results, we propose a speech recognition procedure that employs a combination of two recognition processes; the first process involves the use of a common lexicon and the second involves the use of an individual lexicon. The filled-pause entries in the individual lexicon were estimated on the basis of their occurrence frequencies; these occurrence frequencies were observed from the preparatory results of the first recognition process. The proposed procedure demonstrated a statistically significant improvement in the word accuracy (1.1% word-error reduction) and indicated that the filled-pauses that are rarely used by speakers hinder improvements in word accuracy. We also showed that the use of an individual lexicon that was configured from a combination of the N-best results and word confidence score limitations induced a significant improvement in the word accuracy (1.3% word-error reduction). Furthermore, we examined the applicability of certain methods for reducing the processing time by implementing multiple candidates and confidence score limitations. Our procedure facilitated a significant improvement in the total processing amount (41% reduction in the number of recognition segments of the first recognition process) by using the N-best results and the word confidence score limitations.
AB - One of the characteristics of spontaneous speech is the occurrence of many types of filled-pauses that usually hamper the speech recognition accuracy considerably. In this study, we first investigated the occurrence frequency of filled-pauses in spontaneous speech by using a large corpus. The investigation results revealed that the cumulative occurrence frequency of filled-pauses reaches 0.8 with only four specific filled-pauses on an average; these frequent filled-pauses were differed among speakers. On the basis of these results, we propose a speech recognition procedure that employs a combination of two recognition processes; the first process involves the use of a common lexicon and the second involves the use of an individual lexicon. The filled-pause entries in the individual lexicon were estimated on the basis of their occurrence frequencies; these occurrence frequencies were observed from the preparatory results of the first recognition process. The proposed procedure demonstrated a statistically significant improvement in the word accuracy (1.1% word-error reduction) and indicated that the filled-pauses that are rarely used by speakers hinder improvements in word accuracy. We also showed that the use of an individual lexicon that was configured from a combination of the N-best results and word confidence score limitations induced a significant improvement in the word accuracy (1.3% word-error reduction). Furthermore, we examined the applicability of certain methods for reducing the processing time by implementing multiple candidates and confidence score limitations. Our procedure facilitated a significant improvement in the total processing amount (41% reduction in the number of recognition segments of the first recognition process) by using the N-best results and the word confidence score limitations.
UR - http://www.scopus.com/inward/record.url?scp=84869108663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84869108663&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84869108663
SN - 9781617827457
T3 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
SP - 3872
EP - 3876
BT - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
T2 - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
Y2 - 23 August 2010 through 27 August 2010
ER -