from-text-to-speech-the-mit.../pages-txt/156.txt

From text to speech: The MITalk system

nasalization of a vowel is the reduction in amplitude of the first formant, brought
on by the presence of a nearby low-frequency pole pair and zero pair. The first
formant frequency also tends to shift slightly toward about 500 Hz.

Nasal murmurs and vowel nasalization are approximated by the insertion of
an additional resonator RNP and antiresonator RNZ into the cascade vocal tract
model. The nasal pole frequency FNP and zero frequency FINZ should be set to a
fixed value of about 250 Hz, but the frequency of the nasal zero must be increased
during the production of nasals and nasalization. Strategies for controlling FNZ
are given in Chapter 11. The RNP-RNZ pair is effectively removed from the cas-
cade circuit during the synthesis of nonnasalized speech sounds if FNP=FNZ.

12.2.6 Parallel vocal tract model for frication sources

During frication excitation, the vocal tract transfer function contains both poles
and zeros. The pole frequencies are temporally continuous with formant locations
of adjacent phonetic segments because, by definition, the poles are the natural
resonant frequencies of the entire vocal tract configuration, no matter where the
source is located. Thus, the use of vocalic formant frequency parameters to control
the locations of frication maxima is theoretically well-motivated (and helpful in
preventing the fricative noises from “dissociating” from the rest of the speech
signal).

The zeros in the transfer function for fricatives are the frequencies for which
the impedance (looking back toward the larynx from the position of the frication
source) is infinite, since the series-connected pressure source of turbulence noise
cannot produce any output volume velocity under these conditions. The effect of
transfer-function zeros is two-fold; they introduce notches in the spectrum and they
modify the amplitudes of the formants. The perceptual importance of spectral
notches is not great because masking effects of adjacent harmonics limit the detec-
tability of a spectral notch (Gauffin and Sundberg, 1974). We have found that a
satisfactory approximation to the vocal tract transfer function for frication excita-
tion can be achieved with a parallel set of digital formant resonators having
amplitude controls, and no antiresonators.

Formant amplitudes are set to provide frication excitation for selected for-
mants, usually those associated with the cavity in front of the constriction

(Stevens, 1972). The presence of any transfer function zeros is accounted for by
| appropriate settings of the formant amplitude controls. Relatively simple rules for
determination of the formant amplitude settings (and bypass path amplitude
values) as a function of place of articulation can be derived from a quantal theory
of speech production (Stevens, 1972). The theory states that only formants as-

144