TY - JOUR
T1 - Generation of the vocal tract spectrum from the underlying articulatory mechanism
AU - Kaburagi, Tokihiko
AU - Kim, Jiji
N1 - Funding Information:
We thank Mr. Takemi Mochida for his support given in recording the acoustical and articulatory data at NTT Communication Science Laboratories. This research was partly supported by the Grant-in-Aid for Scientific Research from the JSPS (Grant Nos. 14101001 and 18500134).
PY - 2007
Y1 - 2007
N2 - A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.
AB - A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.
UR - http://www.scopus.com/inward/record.url?scp=33846190239&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33846190239&partnerID=8YFLogxK
U2 - 10.1121/1.2384847
DO - 10.1121/1.2384847
M3 - Article
C2 - 17297800
AN - SCOPUS:33846190239
SN - 0001-4966
VL - 121
SP - 456
EP - 468
JO - Journal of the Acoustical Society of America
JF - Journal of the Acoustical Society of America
IS - 1
ER -