You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

48 lines
2.7 KiB

From text to speech: The MITalk system
12.1.16 Pitch-synchronous updating of voicing source amplitudes
The voicing source amplitude controls AV and AVS only have an effect on the
synthetic waveform when a glottal impulse is issued. The reason for adjusting
voicing amplitudes discontinuously at the onset of each glottal period is to prevent
the creation of pops and clicks due to waveform discontinuities introduced by the
sudden change in an amplitude control in the middle of a voicing period.
12.1.17 Generation of plosive bursts with a predictable spectrum
The noise amplitudes AF and AH are used to interpolate the intensity of the noise
sources linearly over the 5 msec (50 sample) interval. (Thus there is a 5 msec
delay in the attainment of a new amplitude value for a noise source.) Interpolation
permits a more gradual onset for a fricative or HH than would otherwise be pos-
sible. There is, however, one exception to this internal control strategy. A plosive
burst involves a more rapid source onset than can be achieved by 5 msec linear
interpolation. Therefore, if AF increases by more than 50 dB from its value
specified in the previous 5 msec segment, AF is (automatically) changed instan-
taneously to its new target value. The pseudo-random number generator is also
reset at the time of plosive burst onset so as to produce exactly the same source
waveform for each burst. The value to which it is set was chosen so as to produce
as a burst spectrum as flat as possible.
12.1.18 Control of fundamental frequency
At times, it is desired to specify precisely the timing of the first glottal pulse
(voicing onset) relative to a plosive burst. For example, in the syllable pa, it might
be desired to produce a 5 msec burst of frication noise, 40 msec of aspiration noise,
and voicing onset exactly 45 msec from the onset of the burst. Usually, a glottal
pulse is issued in the synthesizer at a time specified by the reciprocal of the value
of the fundamental frequency control parameter extant when the last glottal pulse
was issued. However, if either AV or FO is set to zero, no glottal pulse is issued
during this 5 msec time interval; in fact, no glottal pulses are issued until precisely
the moment that both the AV and FO control parameters become nonzero. In the
case of the pa example above, both AV and FO would normally be set to zero
during the closure interval, burst, and aspiration phase; and AV would be set to
about 60 dB and FO to about 130 Hz at exactly 45 msec after the synthetic burst
onset.
Since the update interval in the synthesizer is set to 5 msec, voice onset time
can be specified exactly in 5 msec steps. If greater precision is needed, it is neces-
sary to change the parameter update interval from 5 msec (NWS=50) to, for ex-
ample, 2 msec (NWS=20).
138