You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
48 lines
2.7 KiB
48 lines
2.7 KiB
From text to speech: The MITalk system
|
|
|
|
In theory, the noise source is an ideal pressure source. The volume velocity
|
|
of the frication noise depends on the impedance seen by the noise source. Since
|
|
the vocal tract transfer function T'(f) relates source volume velocity to lip volume
|
|
velocity, one must estimate noise volume velocity to determine lip output. In the
|
|
general case, this is a complex calculation, but we will assume that source volume
|
|
velocity is proportional to the integral of source pressure (an excellent approxima-
|
|
tion for a frication source at the lips because the radiation impedance is largely in-
|
|
ductive, but only an approximation for other source locations). The integral is ap-
|
|
proximated by a first-order low-pass digital filter LPF that is shown in Figure 12-6.
|
|
Output samples from this filter y(nT) are related to the input sequence x(nT) by
|
|
the equation:
|
|
|
|
y(nT)=x(nT)+y(nT-T)
|
|
|
|
It will be seen later that, the radiation characteristic is a digital high-pass filter
|
|
that exactly cancels out the effects of LPF. (For computational efficiency, the
|
|
radiation characteristic can be moved into the voicing source circuit and the low-
|
|
pass filter LPF can be removed from the noise source.)
|
|
|
|
An example of synthetic frication noise volume velocity that was generated in
|
|
this way is shown in Figure 12-8. The spectrum of this sample of noise fluctuates
|
|
randomly about the expected long-term average noise spectrum (dashed curve -
|
|
shifted up by 10 dB for clarity). Short samples of noise vary in their spectral
|
|
properties due to the nature of random processes.
|
|
|
|
The output of the random number generator is amplitude modulated by the
|
|
component labeled “MOD” in Figure 12-6 whenever the fundamental frequency
|
|
FO and the amplitude of voicing AV are both greater than zero. Voiceless sounds
|
|
(AV=0) are not amplitude modulated because the vocal folds are spread and stif-
|
|
fened, and do not vibrate to modulate the airflow. The degree of amplitude
|
|
modulation is fixed at 50 percent in the synthesizer. The modulation envelope is a
|
|
square wave with a period equal to the fundamental period. Experience has shown
|
|
that it is not necessary to vary the degree of amplitude modulation over the course
|
|
of a sentence, but only to ensure that it is present in voiced fricatives and voiced
|
|
aspirated sounds.
|
|
|
|
The amplitude of the frication noise is determined by AF, which is given in
|
|
dB. A value of 60 will generate a strong frication noise, while a value of zero ef-
|
|
fectively turns off the frication source.
|
|
|
|
12.1.15 Aspiration source
|
|
Aspiration noise is essentially the same as frication noise, except that it is
|
|
generated in the larynx. In a strictly parallel vocal tract model, AF can be used to
|
|
|
|
136
|