from-text-to-speech-the-mit.../pages-txt/139.txt

The Klatt formant synthesizer

purpose use as a strictly parallel synthesizer (Figure 12-4b). The experimenter
must decide beforehand which configuration is to be employed. The change in
configuration depends on the state of a single switch, and the program is smart
enough to avoid performing unnecessary computations for resonators that are not
used. To the extent possible, the synthesizer has been adjusted so as to generate
about the same output waveform whether the cascade/parallel configuration or the
all-parallel configuration is selected.

VOICING
SOURCE LARYNGEAL
D TRANSFER FUNCTION

(CASCADE)
ASPIRATION RADIATION
SOURCE D CHARACTERISTIC
FRICATION

TRANSFER FUNCTION
FRICATION

SOURCE

(PARALLEL)

OUTPUT
SPEECH

VOICING
SOURCE

I
RISTI
SOURCE (PARALLEL) CHARACTERISTIC

ouTPUT
SPEECH

FRICATION
SOURCE
Figure 12-4: Cascade/parallel configurations supported by MITalk

12.1.4 Waveform sampling rate

Most of the sound energy of speech is contained in frequencies between about 80
and 8000 Hz (Dunn and White, 1940). However, intelligibility tests of band-pass
filtered speech indicate that intelligibility is not measurably changed if the energy
in frequencies above about 5000 Hz is removed (French and Steinberg, 1947).
Speech low-pass filtered in this way sounds perfectly natural. Thus we have
selected 10,000 samples per second as the digital sampling rate of the synthesizer.

127