You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
49 lines
2.9 KiB
49 lines
2.9 KiB
The Klatt formant synthesizer
|
|
|
|
12.2 Vocal tract transfer functions
|
|
|
|
The acoustic characteristics of the vocal tract are determined by its cross-sectional
|
|
area as a function of distance from the larynx to the lips. The vocal tract forms a
|
|
nonuniform transmission line whose behavior can be determined for frequencies
|
|
below about 5 kHz by solving a one-dimensional wave equation (Fant, 1960).
|
|
(Above 5 kHz, three-dimensional resonance modes would have to be considered.)
|
|
Solutions to the wave equation result in a transfer function that relates samples of
|
|
the glottal source volume velocity to output volume velocity at the lips.
|
|
|
|
The synthesizer configuration in Figure 12-6 includes components to realize
|
|
two different types of vocal tract transfer function. The first, a cascade configura-
|
|
tion of digital resonators, models the resonant properties of the vocal tract when-
|
|
ever the source of sound is within the larynx. The second, a parallel configuration
|
|
of digital resonators and amplitude controls, models the resonant properties of the
|
|
vocal tract dufing the production of frication noise. The parallel configuration can
|
|
also be used to model vocal tract characteristics for laryngeal sound sources, al-
|
|
though the approximation is not quite as good as in the cascade model.
|
|
|
|
12.2.1 Cascade vocal tract model
|
|
|
|
Assuming that the one-dimensional wave equation is a valid approximation below
|
|
5 kHz, the vocal tract transfer function can be represented in the frequency domain
|
|
by a product of poles and zeros. Furthermore, the transfer function contains only
|
|
about five complex pole pairs and no zeros in the frequency range of interest, as
|
|
long as the articulation is nonnasalized and the sound source is at the larynx (Fant,
|
|
1960). The transfer function conforms to an all-pole model because there are no
|
|
side-branch resonators or multiple sound paths. (The glottis is partially open
|
|
during the production of aspiration so that the poles and zeros of the subglottal sys-
|
|
tem are often seen in aspiration spectra; the only way to approximate their effects
|
|
in the synthesizer is to increase the first formant bandwidth to about 300 Hz. The
|
|
perceptual importance of the remaining spectral distortions caused by the poles and
|
|
|
|
zeros of the subglottal system is probably minimal.)
|
|
Five resonators are appropriate for simulating a vocal tract with a length of
|
|
|
|
about 17 cm, the length of a typical male vocal tract, because the average spacing
|
|
between formants is equal to the velocity of sound divided by half the wavelength,
|
|
which works out to be 1000 Hz. A typical female vocal tract is 15 to 20 percent
|
|
shorter, suggesting that only four formant resonators be used to represent a female
|
|
voice in a 5 kHz simulation (or that the simulation should be extended to about 6
|
|
|
|
kHz). It is suggested that the voices of women and children be approximated by
|
|
setting the control parameter NFC to 4, thus removing the fifth formant from the
|
|
|
|
139
|