You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
48 lines
2.7 KiB
48 lines
2.7 KiB
From text to speech: The MITalk system
|
|
|
|
less predictable. The amount of FO movement on each word depends upon its rank
|
|
in the order of parts of speech of content words (see Table 10-1) and also upon the
|
|
number of syllables in the word. Words of higher rank contain larger FO excur-
|
|
sion. Function words and unstressed syllables of content words are given a slight
|
|
(5 Hz) excursion to produce a more natural-sounding contour.
|
|
|
|
10.4.6 Prosodic indicators
|
|
|
|
A set of “prosodic indicators” is passed from the High Level System to the Low
|
|
Level System. An accent number gives the relative importance of a word. This
|
|
number ranges from “0” for one-syllable articles to “11+n” for a sentential adverb
|
|
containing n syllables. An integer representing the position of a word in a phrase
|
|
and the importance of that phrase is also assigned. Higher absolute values are
|
|
given to words at boundaries marked by punctuation and to words at the boun-
|
|
daries of large or major phrases. Another value assigned to each word is a number
|
|
indicating the amount of continuation rise. Most words are assigned the value “0”,
|
|
but those words ending a nonfinal phrase are usually given a value which reflects
|
|
the importance of the syntactic boundary which the word immediately precedes. A
|
|
level number applies to words in noun phrases not containing conjunctions. This
|
|
number either signifies that the FO level is to rise, or that the FO level should drop
|
|
on that word. Other words are given level “0”. This indicates a mid-phrase word.
|
|
Additionally, the tune value is defined on each word, and is nonzero on the word
|
|
|
|
ending a clause. The number of phrases is also a necessary input value to the next
|
|
level.
|
|
|
|
10.4.7 The Low Level System
|
|
This level reflects the effects of phonemics, lexical stress, and the number of syll-
|
|
ables of the words in the utterance. The number of syllables is used in determining
|
|
the height of the peak on lexically stressed syllables. Although the first and
|
|
highest peak in a sentence is constrained to a maximum of about 190 Hz, longer
|
|
sentences, i.e., sentences with more syllables, begin with higher peaks. This initial
|
|
height allows more freedom of excursion for following peaks. Higher peaks are
|
|
also placed on two lexically stressed syllables if they are separated by unstressed
|
|
syllables, the height of the peaks being dependent upon the number of intervening
|
|
unstressed syllables.
|
|
|
|
The FO pattern is also affected by the phonemics. For example, unvoiced
|
|
consonants at the beginning of a stressed syllable also cause the contour to fall,
|
|
rather than rise, into the contour of the stressed vowel. (The rise is added to the
|
|
|
|
peak of the vowel.) See Figure 10-1 for an example of this contour.
|
|
The algorithm first sets the peaks on the lexically stressed syllables. Falls and
|
|
|
|
104
|