|
|
From text to speech: The MITalk system
|
|
|
|
|
|
We have argued in Chapters 2-6 that in order to transform English text to
|
|
|
speech, one must first try to derive an underlying abstract linguistic representation
|
|
|
for the text. There are at least two reasons why a direct approach is suboptimal: 1)
|
|
|
rules for pronouncing words must take into consideration morphemic structure
|
|
|
(e.g. consider the pronunciation of the th of outhouse) and syntactic structure (e.g.
|
|
|
there exist many noun-verb ambiguities in English such as perm’it - p’ermit), and
|
|
|
2) sentence duration pattern, and fundamental frequency contour depend, to a
|
|
|
major extent, on the syntactic structure of the sentence.
|
|
|
|
|
|
There are currently several text-to-speech systems under development in the
|
|
|
United States (Nye et al., 1973; Kurzweil, 1976; Caldwell, 1979; Morris, 1979)
|
|
|
and elsewhere (Carlson and Granstrom, 1976). The simplest approach is to devise
|
|
|
a set of heuristic letter-to-sound rules and then create an exceptions dictionary for
|
|
|
frequently occurring words that are processed incorrectly by the letter-to-sound
|
|
|
rules (Kurzweil, 1976). The exceptions dictionary is then augmented by function
|
|
|
words that are useful for parsing strategies. The phonetic representation for a sen-
|
|
|
tence that is derived in this way serves as the input to a synthesis-by-rule device
|
|
|
such as Votrax (Gagnon, 1978) or a software synthesis-by-rule program.
|
|
|
|
|
|
The MITalk system represents a more ambitious approach of generalized
|
|
|
morphemic analysis, so as to do a better job of figuring out the pronunciation of
|
|
|
words and to better assign parts of speech to each word, and thereby compute
|
|
|
phrase and clause boundaries with greater accuracy. The real question is whether
|
|
|
current algorithms are good enough to make automatic text-to-speech output ac-
|
|
|
ceptable to the user. There is clear indication that motivated users (such as the
|
|
|
blind) benefit from these devices after a period of acclimation, but considerable
|
|
|
concentration is required.
|
|
|
|
|
|
80
|