|
|
From text to speech: The MITalk system
|
|
|
|
|
|
y(nT)=A"x(nT)+B 'x(nT-T )+C’x(nT-2T) @)
|
|
|
|
|
|
where x (nT-T) and x(nT-2T) are the previous two samples of the input x(nT), the
|
|
|
constants A”, B’ and C” are defined by the equations:
|
|
|
|
|
|
A’=1/A
|
|
|
B’=-B/A ©)
|
|
|
C'=—C/A
|
|
|
|
|
|
where A, B, and C are obtained by inserting the antiresonance center frequency F
|
|
|
and bandwidth BW into Equation 2.
|
|
|
|
|
|
12.1.8 Low-pass resonator
|
|
|
|
|
|
As a special case, the frequency F of a digital resonator can be set to zero, produc-
|
|
|
ing, in effect, a low-pass filter which has a nominal attenuation skirt of -12 dB per
|
|
|
octave of frequency increase and a 3-dB down break frequency equal to BW/2.
|
|
|
The voicing source contains a digital resonator RGP used as a low-pass filter that
|
|
|
transforms a glottal impulse into a pulse having a waveform and spectrum similar
|
|
|
to normal voicing. A second digital resonator, RGS, is used to low-pass filter the
|
|
|
normal voicing waveform to produce the quasi-sinusoidal glottal waveform seen
|
|
|
during the closure interval for an intervocalic voiced plosive.
|
|
|
|
|
|
12.1.9 Synthesizer block diagram
|
|
|
|
|
|
A block diagram of the synthesizer is shown in Figure 12-6. There are 39 control
|
|
|
parameters that determine the characteristics of the output. The name and range of
|
|
|
values for each parameter are given in Table 12-1. As seen from the table, as
|
|
|
many as 22 of the 39 parameters are varied to achieve optimum matches to an ar-
|
|
|
bitrary English utterance. The constant parameters in Table 12-1 have been given
|
|
|
values appropriate for a particular male voice, and would have to be adjusted
|
|
|
slightly to approximate the speech of other male or female talkers. The list of vari-
|
|
|
able control parameters is long, compared with some synthesizers, but the em-
|
|
|
phasis here is on defining strategies for the synthesis of high-quality speech. We
|
|
|
are not concerned with searching for compromises that would minimize the infor-
|
|
|
mation content in the control parameter specification.
|
|
|
|
|
|
12.1.10 Sources of sound
|
|
|
There are two kinds of sound sources that may be activated during speech produc-
|
|
|
tion (Stevens and Klatt, 1974). One involves quasi-periodic vibrations of some
|
|
|
structure, usually the vocal folds. Vibration of the vocal folds is called voicing.
|
|
|
(Other structures such as the lips, tongue tip, or uvula may be caused to vibrate in
|
|
|
sound types of some languages, but not in English.)
|
|
|
|
|
|
The second kind of sound source involves the generation of turbulence noise
|
|
|
|
|
|
130
|