Articulatory-to-speech conversion using bi-directional long short-term memory

Fumiaki Taguchi, Tokihiko Kaburagi

    Research output: Contribution to journalConference articlepeer-review

    16 Citations (Scopus)


    Methods for synthesizing speech sounds from the motion of articulatory organs can be used to produce substitute speech for people who have undergone laryngectomy. To achieve this goal, feature parameters representing the spectral envelope of speech, directly related to the acoustic characteristics of the vocal tract, has been estimated from articulatory movements. Within this framework, speech can be synthesized by driving the filter obtained from a spectral envelope with noise signals. In the current study, we examined an alternative method that generates speech sounds directly from the motion pattern of articulatory organs based on the implicit relationships between articulatory movements and the source signal of speech. These implicit relationships were estimated by considering that articulatory movements are involved in phonological representations of speech that are also related to sound source information such as the temporal pattern of pitch and voiced/unvoiced flag. We developed a method for simultaneously estimating the spectral envelope and sound source parameters from articulatory data obtained with an electromagnetic articulography (EMA) sensor. Furthermore, objective evaluation of estimated speech parameters and subjective evaluation of the word error rate were performed to examine the effectiveness of our method.

    Original languageEnglish
    Pages (from-to)2499-2503
    Number of pages5
    JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    Publication statusPublished - 2018
    Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
    Duration: Sept 2 2018Sept 6 2018

    All Science Journal Classification (ASJC) codes

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation


    Dive into the research topics of 'Articulatory-to-speech conversion using bi-directional long short-term memory'. Together they form a unique fingerprint.

    Cite this