from-text-to-speech-the-mit.../pages-txt/159.txt

The Klatt formant synthesizer

ample, if a formant frequency is halved, amplitudes of all higher for-

mants are decreased by 12 dB, i.e. (.5)?, as shown in part (c) of
Figure 12-11.

5. The frequencies of two adjacent formants cannot come any closer

than about 200 Hz because of coupling between the natural modes of

- the vocal tract. However, if two formants approach each other by

about this amount, both formant peaks are increased by an additional

3 to 6 dB, as shown in part (d) of Figure 12-11.

The amplitudes of the formant peaks generated by the parallel vocal tract
model have been constrained so that, if Al to A5 are all set to 60 dB, the transfer
function will approximate that found in the cascade model. This is accomplished
by: 1) adjusting the gain of the higher frequency formants to take into account
frequency changes in lower formants (since a higher formant rides on the skirts of
the transfer function of all lower formants in a cascade model (Fant, 1960)), 2) in-
corporating rules to cause formant amplitudes to increase whenever two formant
frequencies come into proximity, and 3) using a first difference calculation to
remove low-frequency energy from the higher formants; this energy would other-
wise distort the spectrum in the region of F1 during the synthesis of some vowels
(Holmes, 1973).

- 'The magnitude of the vocal tract transfer functions of the cascade and parallel
vocal tract models are compared in Figure 12-12 for several vowels. The match is
quite good in the vicinity of formant peaks, but the parallel model introduces trans-
fer function zeros (notches) in the spectrum between formant peaks. The notches
are of relatively little perceptual importance because energy in the formant peak
adjacent to the notch on the low-frequency side tends to mask the detectability of a
spectral notch (Gauffin and Sundberg, 1974).

Many early parallel synthesizers were programmed to add together formant
outputs without filtering out the energy at low frequencies from resonators other
than F1. In other cases, formant outputs were combined in alternating signs. The
deleterious effects of these choices are illustrated in Figure 12-13. Some vowel
spectra are poorly modeled in both of these parallel methods of synthesis. The per-
ceptual degradation is less in the alternating sign case because spectral notches are
less perceptible than energy-fill in a spectral valley between two formants. Com-
parison of Figure 12-12 and Figure 12-13 indicates that our parallel configuration
is better than either of those shown in Figure 12-13.

A nasal formant resonator RNP appears in the parallel branch to assist in the
approximation of nasal murmurs and vowel nasalization when the cascade branch

147