from-text-to-speech-the-mit.../pages-txt/137.txt

The Klatt formant synthesizer

P(f) = S()"T(f)"F(f)

SOUND SOURCE
VOICING
ASPIRATION
FRICATION

VOCAL TRACT
TRANSFER
SOURCE FUNCTION LIP RADIATED
VOLUME T(f) VOLUME SOUND
VELOCITY VELOCITY PRESSURE

S(f) u(®) | P(f)
Figure 12-2: Components of the output spectrum of a speech sound

RADIATION
CHARACTERISTIC
R(f)

domain. This is actually how a waveform is generated in the computer. The syn-
thesizer includes ‘components to simulate the generation of several different kinds
of sound sources (described in Section 12.1.10), components to simulate the vocal

tract transfer function (Figure 12-3), and a component to simulate sound radiation
from the head (Figure 12-14).

12.1.3 Cascade vs. parallel

A number of hardware and software speech synthesizers have been described
(Dudley et al., 1939; Cooper et al., 1951; Lawrence, 1953; Stevens et al., 1955;
Fant, 1959; Fant and Martony, 1962; Flanagan et al., 1962; Holmes et al., 1964;
Epstein, 1965; Tomlinson, 1966; Scott ef al., 1966; Liljencrants, 1968; Rabiner et
al., 1971a; Klatt, 1972; Holmes, 1973). They employ different configurations to
achieve what is hopefully the same result: high-quality approximation to human
speech. A few of the synthesizers have stability and calibration problems, and a
few have design deficiencies that make it impossible to synthesize a good voiced
fricative, but many others have an excellent design. Of the best synthesizers that
have been proposed, two general configurations are common.

In one type of configuration, called a parallel formant synthesizer (see e.g.
Lawrence, 1953; Holmes, 1973), the formant resonators that simulate the transfer
function of the vocal tract are connected in parallel, as shown in the lower portion
of Figure 12-3. Each formant resonator is preceded by an amplitude control that
determines the relative amplitude of a spectral peak (formant) in the output
spectrum for both voiced and voiceless speech sounds. In the second type of con-
figuration, called a cascade formant synthesizer (see e.g. Fant, 1959; Klatt, 1972),
sonorants are synthesized using a set of formant resonators connected in cascade,

as shown in the upper part of Figure 12-3.
The advantage of the cascade connection is that the relative amplitudes of for-

mant peaks for vowels come out just right (Fant, 1956) without the need for in-

125