|
|
From text to speech: The MITalk system
|
|
|
|
|
|
BW2=100 BW2=50
|
|
|
|
|
|
20 Uniform tube 20 \ BW2=200
|
|
|
10 10 /
|
|
|
0 0
|
|
|
-10
|
|
|
1 2 3 4 5 0 1 2 3 4 5
|
|
|
) (b)
|
|
|
|
|
|
0
|
|
|
(a
|
|
|
|
|
|
o
|
|
|
o
|
|
|
|
|
|
Transfer function |T(f)| (dB)
|
|
|
(O}
|
|
|
o
|
|
|
|
|
|
30
|
|
|
20 ' F1=500 20 ‘
|
|
|
|
|
|
10 10 / ”\ 3
|
|
|
0 0 T
|
|
|
-10 250 -10 2
|
|
|
|
|
|
-20 -20
|
|
|
0 1 2 3 4 5 0 1 2 3 4 5
|
|
|
|
|
|
Frequency (kHz) Frequency (kHz)
|
|
|
(c) (d)
|
|
|
|
|
|
Figure 12-11: Effect of parameter changes on the vocal tract transfer function
|
|
|
|
|
|
where formant frequencies are set to 500, 1500, 2500, 3500, and
|
|
|
4500 Hz and formant bandwidths are set to be equal at 100 Hz. This
|
|
|
corresponds to a vocal tract having a uniform cross-sectional area, a
|
|
|
closed glottis, open lips (and a nonrealistic set of bandwidth values),
|
|
|
as shown in part (a) of Figure 12-11.
|
|
|
|
|
|
2. The amplitude of a formant peak is inversely proportional to its
|
|
|
bandwidth. If a formant bandwidth is doubled, that formant peak is
|
|
|
reduced in amplitude by 6 dB. If the bandwidth is halved, the peak is
|
|
|
increased by 6 dB, as shown in part (b) of Figure 12-11.
|
|
|
|
|
|
3. The amplitude of a formant peak is proportional to formant fre-
|
|
|
quency. If a formant frequency is doubled, that formant peak is in-
|
|
|
creased by 6 dB, as shown in part (c) of Figure 12-11. (This is true
|
|
|
of T(f), but not of the resulting speech output spectrum since the
|
|
|
glottal source spectrum falls off at about -12 dB/octave of frequency
|
|
|
increase, and the radiation characteristic imposes a +6 dB/octave
|
|
|
spectral tilt resulting in a net change in formant amplitude of +6 -12
|
|
|
+6 =0dB.)
|
|
|
|
|
|
4. Changes to a formant frequency also affect the amplitudes of higher
|
|
|
formant peaks by a factor proportional to frequency squared. For ex-
|
|
|
|
|
|
146
|
|
|
|
|
|
)
|