You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
50 lines
2.6 KiB
50 lines
2.6 KiB
From text to speech: The MITalk system
|
|
|
|
tal pulse, nor does it contain spectral zeros of the kind that often appear in natural
|
|
voicing, but neither of these differences is judged to be very important percep-
|
|
tually.
|
|
|
|
The antiresonator RGZ is used to modify the detailed shape of the spectrum
|
|
of the voicing source for particular individuals with greater precision than would
|
|
be possible using only a single low-pass filter. The values chosen for FGZ and
|
|
BGZ in Table 12-1 are such as to tilt the general voicing spectrum up somewhat to
|
|
match the vocal characteristics of speaker DHK. The waveform and spectral en-
|
|
|
|
velope of normal voicing that are produced by sending an impulse train through
|
|
RGP and RGZ are shown in Figure 12-7.
|
|
|
|
12.1.13 Quasi-sinusoidal voicing
|
|
|
|
The amplitude control parameter AVS determines the amount of smoothed voicing
|
|
generated during voiced fricatives, voiced aspirates, and the voicebars present in
|
|
intervocalic voiced plosives. An appropriate wave shape for quasi-sinusoidal voic-
|
|
ing is obtained by low-pass filtering an impulse by low-pass digital resonators
|
|
RGP and RGS. The frequency control of RGS is set to zero to produce a low-pass
|
|
filter, and BGS=200 determines the cutoff frequency beyond which harmonics are
|
|
strongly attenuated.
|
|
|
|
The waveform and spectral envelope of quasi-sinusoidal voicing are shown in
|
|
Figure 12-7. After the effects of the vocal tract transfer function and radiation
|
|
characteristic are imposed on the source spectrum, the output waveform of quasi-
|
|
sinusoidal voicing contains significant energy only at the first and second har-
|
|
monics of the fundamental frequency. AVS ranges from about 60 dB in a voicebar
|
|
or strongly voiced fricative to O dB if no quasi-sinusoidal voicing is present. Some
|
|
degree of quasi-sinusoidal voicing can be added to the normal voicing source (in
|
|
|
|
combination with aspiration noise) to produce a breathy voice quality (e.g.
|
|
AH=AV-3, AVS=AV-6).
|
|
|
|
12.1.14 Frication source
|
|
|
|
A turbulent noise source is simulated in the synthesizer by a pseudo-random num-
|
|
ber generator, a modulator, an amplitude control AF, and a -6 dB/octave low-pass
|
|
digital filter LPF, as shown in Figure 12-6. Theoretically, the spectrum of the
|
|
frication source should be approximately flat (Stevens, 1971), and the amplitude
|
|
distribution should be Gaussian. Signals produced by the random number gener-
|
|
ator have a flat spectrum, but they have a uniform amplitude distribution between
|
|
limits determined by the value of the amplitude control parameter AF. A pseudo-
|
|
Gaussian amplitude distribution is obtained in the synthesizer by summing 16 of
|
|
the numbers produced by the random number generator.
|
|
|
|
134
|