Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech

Takuya Kishida, Yoshitaka Nakajima, Kazuo Ueda, Gerard B. Remijn

    Research output: Contribution to journalArticlepeer-review

    9 Citations (Scopus)

    Abstract

    Factor analysis (principal component analysis followed by varimax rotation) had shown that 3 common factors appear across 20 critical-band power fluctuations derived from spoken sentences of eight different languages [Ueda et al. (2010). Fechner Day 2010, Padua]. The present study investigated the contributions of such power-fluctuation factors to speech intelligibility. The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. The resynthesized speech sounds were used for an intelligibility test. The modification of factor analysis ensured that the resynthesized speech sounds were not accompanied by a steady background noise caused by the data reduction procedure. Spoken sentences of British English, Japanese, and Mandarin Chinese were subjected to this modified analysis. Confirming the earlier analysis, indeed 3-4 factors were common to these languages. The number of power-fluctuation factors needed to make noise-vocoded speech intelligible was then examined. Critical-band power fluctuations of the Japanese spoken sentences were resynthesized from the obtained factors, resulting in noise-vocoded-speech stimuli, and the intelligibility of these speech stimuli was tested by 12 native Japanese speakers. Japanese mora (syllable-like phonological unit) identification performances were measured when the number of factors was 1-9. Statistically significant improvement in intelligibility was observed when the number of factors was increased stepwise up to 6. The 12 listeners identified 92.1% of the morae correctly on average in the 6-factor condition. The intelligibility improved sharply when the number of factors changed from 2 to 3. In this step, the cumulative contribution ratio of factors improved only by 10.6%, from 37.3 to 47.9%, but the average mora identification leaped from 6.9 to 69.2%. The results indicated that, if the number of factors is 3 or more, elementary linguistic information is preserved in such noise-vocoded speech.

    Original languageEnglish
    Article number517
    JournalFrontiers in Psychology
    Volume7
    Issue numberAPR
    DOIs
    Publication statusPublished - 2016

    All Science Journal Classification (ASJC) codes

    • Experimental and Cognitive Psychology
    • Linguistics and Language
    • Acoustics and Ultrasonics

    Fingerprint

    Dive into the research topics of 'Three factors are critical in order to synthesize intelligible noise-vocoded Japanese speech'. Together they form a unique fingerprint.

    Cite this