|
|
From text to speech: The MITalk system
|
|
|
|
|
|
were given the label “function” are elevated to “content” importance in the FO al-
|
|
|
gorithm. These are:
|
|
|
|
|
|
¢ Demonstrative pronouns (this, those)
|
|
|
¢ Contractions (we’ll, boys’ll)
|
|
|
|
|
|
¢ Modals (should, might, will, can)
|
|
|
|
|
|
¢ Quantifiers (several, many)
|
|
|
|
|
|
¢ Interrogative adjectives (which, whose)
|
|
|
|
|
|
The FO algorithm requires a specification of the number of syllables in each
|
|
|
word, the location of the stressed syllable within the word, and information con-
|
|
|
cerning syllable boundaries. This information is found in the PROSOD output file.
|
|
|
The phonemic information in this file is also used to specify a structure for each
|
|
|
syllable. This structure is an allowable ordering of voiced or unvoiced obstruents,
|
|
|
sonorants, and a single vowel.
|
|
|
|
|
|
10.3 Output
|
|
|
|
|
|
There are two possible output files. One file is a stream of fundamental frequency
|
|
|
values, one value for each 5 msec of the utterance. This file can be merged with
|
|
|
the output of PHONET (discussed in Chapter 11) which gives values of the 20
|
|
|
variable parameters each 5 msec. These values are calculated by determining the
|
|
|
changes in FO during a syllable and using the duration of the segments within the
|
|
|
syllable to describe a contour with constant slope (absolute value).
|
|
|
|
|
|
A second method, the one currently in use, is to calculate rises and falls on
|
|
|
each segment (an intermediate stage in the former method) and to use this infor-
|
|
|
mation to specify FO target values for the midpoint of each segment and for the
|
|
|
peak point at either the left or right boundary of stressed vowels in content words.
|
|
|
Unspecified onset values for segments are determined by linear interpolation be-
|
|
|
tween their midpoint target value and the midpoint target value of the preceding
|
|
|
segment. This method allows FO values to be calculated every 5 msec using the
|
|
|
same linear smoothing procedure which is used for some of the other parameters,
|
|
|
modified slightly by the addition of the possible extra target value as input.
|
|
|
|
|
|
Most peaks are assigned to the right boundary of the stressed vowel in a con-
|
|
|
tent word. A fall (and possible continuation rise) following the rise which forms
|
|
|
the peak is then assigned to the midpoint or right boundary of the following seg-
|
|
|
ment, absorbing any fall or rise that might previously have been assigned to that
|
|
|
segment. A peak is assigned to the left boundary of a “nuclear-stressed” syllable,
|
|
|
i.e., the stressed syllable in the final content word of a phrase preceding a silence.
|
|
|
Preceding unassigned rises or falls are absorbed in the assignment of the peak.
|
|
|
|
|
|
102
|