from-text-to-speech-the-mit.../pages-txt/110.txt

From text to speech: The MITalk system

word boundaries, but not across phrase boundaries) (Klatt, 1973), and in

vowel-vowel sequences.

Context PRCNT1
vowel followed by a vowel 120
vowel preceded by a vowel 70
consonant surrounded by consonants 50
consonant preceded by a consonant 70
consonant followed by a consonant 70
11. Lengthening due to plosive aspiration:

A 1-stressed or 2-stressed vowel or sonorant preceded by a voiceless

plosive is lengthened by 25 msec (Peterson and Lehiste, 1960).

When the rules are applied to the RR of “rocker” in Figure 9-1, the second
rule sets PRCNT to 140, the fifth rule reduces PRCNT to 112, the seventh rule
reduces MINDUR to 30 msec and PRCNT to 78.4, and the ninth rule increases
PRCNT to 94. Then INHDUR, MINDUR, and PRCNT are inserted in Equation 1,
and the resulting duration is rounded up to the nearest 5 msec to obtain the value of
175 msec shown in the lower part of Figure 9-1.

The resulting durations are determined in part by a variable that controls the
nominal speaking rate SPRATE which can be set to any number between 60 and
300 words per minute. The default value is 180 words per minute. At rates slower
than 150 wpm, a short pause is inserted between a content word and a following
function word. (At a normal speaking rate, brief pauses are inserted only at the
ends of clauses.) Individual segments are lengthened or shortened slightly depend-
ing on speaking rate, but most of the rate change is realized by manipulating pause
durations (Goldman-Eisler, 1968).

The present rules are only a crude approximation to many of the durational
phenomena seen in sentences (e.g. consonant interactions in clusters) and the rules
completely ignore other factors. Nevertheless, to a first approximation, the rules
capture a great deal of the systematic variation in segmental durations for speaker
DHK. When compared with spectrograms of new paragraphs read by this speaker,
the rule system produces segmental durations that differ from measured durations
by a standard deviation of 17 msec (excluding the prediction of pause durations).
The rules account for 84 percent of the observed total variance in segmental dura-
tions. Seventeen msec is generally less than the just noticeable difference for a
single change to segmental duration in sentence materials (Klatt, 1976a).

A perceptual evaluation of the performance of the rule system is discussed by
Carlson et al. (1979). The perceptual results are encouraging in that both natural-

98