You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
50 lines
2.5 KiB
50 lines
2.5 KiB
From text to speech: The MITalk system
|
|
|
|
word boundaries, but not across phrase boundaries) (Klatt, 1973), and in
|
|
|
|
vowel-vowel sequences.
|
|
|
|
Context PRCNT1
|
|
vowel followed by a vowel 120
|
|
vowel preceded by a vowel 70
|
|
consonant surrounded by consonants 50
|
|
consonant preceded by a consonant 70
|
|
consonant followed by a consonant 70
|
|
11. Lengthening due to plosive aspiration:
|
|
|
|
A 1-stressed or 2-stressed vowel or sonorant preceded by a voiceless
|
|
|
|
plosive is lengthened by 25 msec (Peterson and Lehiste, 1960).
|
|
|
|
When the rules are applied to the RR of “rocker” in Figure 9-1, the second
|
|
rule sets PRCNT to 140, the fifth rule reduces PRCNT to 112, the seventh rule
|
|
reduces MINDUR to 30 msec and PRCNT to 78.4, and the ninth rule increases
|
|
PRCNT to 94. Then INHDUR, MINDUR, and PRCNT are inserted in Equation 1,
|
|
and the resulting duration is rounded up to the nearest 5 msec to obtain the value of
|
|
175 msec shown in the lower part of Figure 9-1.
|
|
|
|
The resulting durations are determined in part by a variable that controls the
|
|
nominal speaking rate SPRATE which can be set to any number between 60 and
|
|
300 words per minute. The default value is 180 words per minute. At rates slower
|
|
than 150 wpm, a short pause is inserted between a content word and a following
|
|
function word. (At a normal speaking rate, brief pauses are inserted only at the
|
|
ends of clauses.) Individual segments are lengthened or shortened slightly depend-
|
|
ing on speaking rate, but most of the rate change is realized by manipulating pause
|
|
durations (Goldman-Eisler, 1968).
|
|
|
|
The present rules are only a crude approximation to many of the durational
|
|
phenomena seen in sentences (e.g. consonant interactions in clusters) and the rules
|
|
completely ignore other factors. Nevertheless, to a first approximation, the rules
|
|
capture a great deal of the systematic variation in segmental durations for speaker
|
|
DHK. When compared with spectrograms of new paragraphs read by this speaker,
|
|
the rule system produces segmental durations that differ from measured durations
|
|
by a standard deviation of 17 msec (excluding the prediction of pause durations).
|
|
The rules account for 84 percent of the observed total variance in segmental dura-
|
|
tions. Seventeen msec is generally less than the just noticeable difference for a
|
|
single change to segmental duration in sentence materials (Klatt, 1976a).
|
|
|
|
A perceptual evaluation of the performance of the rule system is discussed by
|
|
Carlson et al. (1979). The perceptual results are encouraging in that both natural-
|
|
|
|
98
|