from-text-to-speech-the-mit.../pages-txt/148.txt

From text to speech: The MITalk system

In theory, the noise source is an ideal pressure source. The volume velocity
of the frication noise depends on the impedance seen by the noise source. Since
the vocal tract transfer function T'(f) relates source volume velocity to lip volume
velocity, one must estimate noise volume velocity to determine lip output. In the
general case, this is a complex calculation, but we will assume that source volume
velocity is proportional to the integral of source pressure (an excellent approxima-
tion for a frication source at the lips because the radiation impedance is largely in-
ductive, but only an approximation for other source locations). The integral is ap-
proximated by a first-order low-pass digital filter LPF that is shown in Figure 12-6.
Output samples from this filter y(nT) are related to the input sequence x(nT) by
the equation:

y(nT)=x(nT)+y(nT-T)

It will be seen later that, the radiation characteristic is a digital high-pass filter
that exactly cancels out the effects of LPF. (For computational efficiency, the
radiation characteristic can be moved into the voicing source circuit and the low-
pass filter LPF can be removed from the noise source.)

An example of synthetic frication noise volume velocity that was generated in
this way is shown in Figure 12-8. The spectrum of this sample of noise fluctuates
randomly about the expected long-term average noise spectrum (dashed curve -
shifted up by 10 dB for clarity). Short samples of noise vary in their spectral
properties due to the nature of random processes.

The output of the random number generator is amplitude modulated by the
component labeled “MOD” in Figure 12-6 whenever the fundamental frequency
FO and the amplitude of voicing AV are both greater than zero. Voiceless sounds
(AV=0) are not amplitude modulated because the vocal folds are spread and stif-
fened, and do not vibrate to modulate the airflow. The degree of amplitude
modulation is fixed at 50 percent in the synthesizer. The modulation envelope is a
square wave with a period equal to the fundamental period. Experience has shown
that it is not necessary to vary the degree of amplitude modulation over the course
of a sentence, but only to ensure that it is present in voiced fricatives and voiced
aspirated sounds.

The amplitude of the frication noise is determined by AF, which is given in
dB. A value of 60 will generate a strong frication noise, while a value of zero ef-
fectively turns off the frication source.

12.1.15 Aspiration source
Aspiration noise is essentially the same as frication noise, except that it is
generated in the larynx. In a strictly parallel vocal tract model, AF can be used to

136