from-text-to-speech-the-mit.../pages-txt/157.txt

The Klatt formant synthesizer

sociated with the cavity in front of the oral constriction are strongly excited. The
theory is supported by the formant amplitude specifications for fricatives and
plosive bursts presented in Chapter 11. These amplitude control data were derived
from attempts to match natural frication spectra.

There are six formant resonators in the parallel configuration of Figure 12-6.
A sixth formant has been added to the parallel branch specifically for the synthesis
of very-high-frequency noise in ss, zz. The main energy concentration in these
alveolar fricatives is centered on a frequency of about 6 kHz. This is above the
highest frequency (5 kHz) that can be synthesized in a 10,000 sample/second
simulation. However, in a ss, there is gradually increasing frication noise in the
frequencies immediately below 5 kHz due to the low-frequency skirt of the 6 kHz
formant resonance, and this noise spectrum can be approximated quite well by a
resonator positioned at about 4900 Hz. We have found it better to include an extra
resonator to simulate high-frequency noise than to move F5 up in frequency when-
ever a sibilant is to be synthesized, because clicks and moving energy concentra-
tions are thereby avoided.

Also included in the parallel vocal tract model is a bypass path. The bypass
path with amplitude control AB is present because the transfer function contains
no prominent resonant peaks during the production of FFr, vv, TH, and DH, and
the synthesizer should include a means of bypassing all of the resonators to
produce a flat transfer function.

During the production of a voiced fricative, there are two active sources of
sound, one located at the glottis (voicing) and one at a constriction in the vocal
tract (frication). The output of the quasi-sinusoidal voicing source is sent through
the cascade vocal tract model, while the frication source excites the parallel branch
to generate a voiced fricative.

12.2.7 Simulation of the cascade configuration by the parallel configuration

The transfer function of the laryngeally excited vocal tract can also be ap-
proximated by five digital formant resonators connected in parallel. The same
resonators that form the parallel branch for frication excitation can be used if
suitable values are chosen for the formant amplitude controls.

The following rules summarize what happens to formant amplitudes in the
transfer function T(f) of a cascade model as the lowest five formant frequencies
and bandwidths are changed. These relationships follow directly from Equation 6,
under the assumption that each formant frequency F(n) is at least five to ten times
as large as the formant bandwidth BW(n):

1. The formant peaks in the transfer function are equal for the case

145