You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
43 lines
1.8 KiB
43 lines
1.8 KiB
From text to speech: The MITalk system
|
|
|
|
VTAR3: : DTAR3
|
|
|
|
Frequency
|
|
|
|
| DTART
|
|
|
|
TCDIPH TCDIPH
|
|
|
|
0 TDMID INHDUR
|
|
Time
|
|
|
|
Figure 11-7: Constants used to specify the inherent formant and durational
|
|
characteristics of a sonorant
|
|
|
|
In addition to differences in source amplitudes, voiced and voiceless fricatives dif-
|
|
fer in that F1 is higher and B1 is larger when the glottis is open.
|
|
|
|
The affricate parameters in Table 11-2 refer to the fricative portion of the af-
|
|
fricate. Similarly, the plosive parameters in Table 11-2 refer to the brief burst of
|
|
frication noise generated at plosive release. Formant frequency values again serve
|
|
as loci for predicting formant positions at voicing onset.
|
|
|
|
The parameters that are used to generate a nasal murmur include the nasal
|
|
pole and zero frequencies FNP and FNZ. The nasal pole and zero are used
|
|
primarily to approximate vowel nasalization at nasal release by splitting F1 into a
|
|
pole-zero-pole complex. The details of nasal murmurs that have been described by
|
|
Fujimura (1962) are approximated by formant bandwidth adjustments rather than
|
|
by the theoretically correct method of pole-zero insertion. The reason is that it is
|
|
not possible to simulate both the higher frequency pole-zero details of nasal mur-
|
|
murs and vowel nasalization simultaneously without moving the frequency of the
|
|
nasal pole and zero very fast at release, which would generate an objectionable
|
|
click in the output, and vowel nasalization has been found to be perceptually more
|
|
important. A nasalized vowel is generated by increasing F1 by about 100 Hz, and
|
|
by setting the frequency of the nasal zero to be the average of this new F1 value
|
|
and 270 Hz (the frequency of the fixed nasal pole).
|
|
|
|
Not included in Tables 11-1 and 11-2 are steady-state target values for un-
|
|
stressed allophones, postvocalic allophones, flaps, glottal stops, voicebars, and
|
|
|
|
120
|