You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
55 lines
2.7 KiB
55 lines
2.7 KiB
The Klatt formant synthesizer
|
|
|
|
the vibrations of the vocal folds. In addition, the vocal folds may vibrate without
|
|
|
|
meeting in the midline. In this type of voicing, the amplitude of higher frequency
|
|
|
|
harmonics of the voicing source spectrum is significantly reduced and the
|
|
|
|
waveform looks nearly sinusoidal. Therefore, the synthesizer should be capable of
|
|
|
|
generating at least two types of voicing waveforms (normal voicing and quasi- |
|
|
sinusoidal voicing), two types of frication waveforms (normal frication and
|
|
|
|
amplitude-modulated frication), and two types of aspiration (normal aspiration and
|
|
|
|
amplitude-modulated aspiration). These are the only kinds of sound sources re-
|
|
|
|
quired for English, although trills and clicks of other languages may call for the
|
|
|
|
addition of other source controls to the synthesizer in the future.
|
|
|
|
12.1.11 Voicing source
|
|
|
|
The structure of the voicing source is shown at the top left in Figure 12-6. Vari-
|
|
able control parameters are used to specify the fundamental frequency of voicing
|
|
(FO), the amplitude of normal voicing (AV), and the amplitude of quasi-sinusoidal
|
|
voicing (AVS).
|
|
|
|
An impulse train corresponding to normal voicing is generated whenever FO
|
|
is greater than zero. The amplitude of each impulse is determined by AV, the
|
|
amplitude of normal voicing in dB. AV ranges from about 60 dB in a strong
|
|
vowel to 0 dB when the voicing source is turned off. Fundamental frequency is
|
|
specified in Hz; a value of FO=100 would produce a 100-Hz impulse train. The
|
|
number of samples between impulses, TO, is determined by SR/FO0, e.g., for a sam-
|
|
pling rate of 10,000 and a fundamental frequency of 200 Hz, an impulse is
|
|
generated every 50th sample. Under some circumstances, the quantization of the
|
|
fundamental period to be an integral number of samples might be perceived in a
|
|
slow, prolonged fundamental frequency transition as a sort of staircase of mechani-
|
|
- cal sounds (similar to the rather unnatural speech one gets by setting FO to a con-
|
|
stant value in a synthetic utterance). But the problem is not sufficiently serious to
|
|
merit running the source model of the synthesizer at a higher sampling rate. If
|
|
desired, some aspiration noise can be added to the normal voicing waveform to
|
|
partially alleviate the problem and create a somewhat breathy voice quality.
|
|
|
|
12.1.12 Normal voicing
|
|
Ignoring for the moment the effects of RGZ, we see that the train of impulses is
|
|
|
|
sent through a low-pass filter, RGP, to produce a smooth waveform that resembles
|
|
a typical glottal volume velocity waveform (Flanagan, 1958). The resonator fre-
|
|
quency FGP is set to 0 Hz and BGP to 100 Hz. The filtered impulses thus have a
|
|
spectrum that falls off smoothly at approximately -12 dB per octave above 50 Hz.
|
|
The waveform generated does not have the same phase spectrum as a typical glot-
|
|
|
|
133
|