|
|
The Klatt formant synthesizer
|
|
|
|
|
|
P(f) = S()"T(f)"F(f)
|
|
|
|
|
|
SOUND SOURCE
|
|
|
VOICING
|
|
|
ASPIRATION
|
|
|
FRICATION
|
|
|
|
|
|
VOCAL TRACT
|
|
|
TRANSFER
|
|
|
SOURCE FUNCTION LIP RADIATED
|
|
|
VOLUME T(f) VOLUME SOUND
|
|
|
VELOCITY VELOCITY PRESSURE
|
|
|
|
|
|
S(f) u(®) | P(f)
|
|
|
Figure 12-2: Components of the output spectrum of a speech sound
|
|
|
|
|
|
RADIATION
|
|
|
CHARACTERISTIC
|
|
|
R(f)
|
|
|
|
|
|
domain. This is actually how a waveform is generated in the computer. The syn-
|
|
|
thesizer includes ‘components to simulate the generation of several different kinds
|
|
|
of sound sources (described in Section 12.1.10), components to simulate the vocal
|
|
|
|
|
|
tract transfer function (Figure 12-3), and a component to simulate sound radiation
|
|
|
from the head (Figure 12-14).
|
|
|
|
|
|
12.1.3 Cascade vs. parallel
|
|
|
|
|
|
A number of hardware and software speech synthesizers have been described
|
|
|
(Dudley et al., 1939; Cooper et al., 1951; Lawrence, 1953; Stevens et al., 1955;
|
|
|
Fant, 1959; Fant and Martony, 1962; Flanagan et al., 1962; Holmes et al., 1964;
|
|
|
Epstein, 1965; Tomlinson, 1966; Scott ef al., 1966; Liljencrants, 1968; Rabiner et
|
|
|
al., 1971a; Klatt, 1972; Holmes, 1973). They employ different configurations to
|
|
|
achieve what is hopefully the same result: high-quality approximation to human
|
|
|
speech. A few of the synthesizers have stability and calibration problems, and a
|
|
|
few have design deficiencies that make it impossible to synthesize a good voiced
|
|
|
fricative, but many others have an excellent design. Of the best synthesizers that
|
|
|
have been proposed, two general configurations are common.
|
|
|
|
|
|
In one type of configuration, called a parallel formant synthesizer (see e.g.
|
|
|
Lawrence, 1953; Holmes, 1973), the formant resonators that simulate the transfer
|
|
|
function of the vocal tract are connected in parallel, as shown in the lower portion
|
|
|
of Figure 12-3. Each formant resonator is preceded by an amplitude control that
|
|
|
determines the relative amplitude of a spectral peak (formant) in the output
|
|
|
spectrum for both voiced and voiceless speech sounds. In the second type of con-
|
|
|
figuration, called a cascade formant synthesizer (see e.g. Fant, 1959; Klatt, 1972),
|
|
|
sonorants are synthesized using a set of formant resonators connected in cascade,
|
|
|
|
|
|
as shown in the upper part of Figure 12-3.
|
|
|
The advantage of the cascade connection is that the relative amplitudes of for-
|
|
|
|
|
|
mant peaks for vowels come out just right (Fant, 1956) without the need for in-
|
|
|
|
|
|
125
|