You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

58 lines
2.3 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

The Klatt formant synthesizer
P(f) = S()"T(f)"F(f)
SOUND SOURCE
VOICING
ASPIRATION
FRICATION
VOCAL TRACT
TRANSFER
SOURCE FUNCTION LIP RADIATED
VOLUME T(f) VOLUME SOUND
VELOCITY VELOCITY PRESSURE
S(f) u(®) | P(f)
Figure 12-2: Components of the output spectrum of a speech sound
RADIATION
CHARACTERISTIC
R(f)
domain. This is actually how a waveform is generated in the computer. The syn-
thesizer includes components to simulate the generation of several different kinds
of sound sources (described in Section 12.1.10), components to simulate the vocal
tract transfer function (Figure 12-3), and a component to simulate sound radiation
from the head (Figure 12-14).
12.1.3 Cascade vs. parallel
A number of hardware and software speech synthesizers have been described
(Dudley et al., 1939; Cooper et al., 1951; Lawrence, 1953; Stevens et al., 1955;
Fant, 1959; Fant and Martony, 1962; Flanagan et al., 1962; Holmes et al., 1964;
Epstein, 1965; Tomlinson, 1966; Scott ef al., 1966; Liljencrants, 1968; Rabiner et
al., 1971a; Klatt, 1972; Holmes, 1973). They employ different configurations to
achieve what is hopefully the same result: high-quality approximation to human
speech. A few of the synthesizers have stability and calibration problems, and a
few have design deficiencies that make it impossible to synthesize a good voiced
fricative, but many others have an excellent design. Of the best synthesizers that
have been proposed, two general configurations are common.
In one type of configuration, called a parallel formant synthesizer (see e.g.
Lawrence, 1953; Holmes, 1973), the formant resonators that simulate the transfer
function of the vocal tract are connected in parallel, as shown in the lower portion
of Figure 12-3. Each formant resonator is preceded by an amplitude control that
determines the relative amplitude of a spectral peak (formant) in the output
spectrum for both voiced and voiceless speech sounds. In the second type of con-
figuration, called a cascade formant synthesizer (see e.g. Fant, 1959; Klatt, 1972),
sonorants are synthesized using a set of formant resonators connected in cascade,
as shown in the upper part of Figure 12-3.
The advantage of the cascade connection is that the relative amplitudes of for-
mant peaks for vowels come out just right (Fant, 1956) without the need for in-
125