You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

46 lines
2.9 KiB

From text to speech: The MITalk system
From the above discussion, it is clear that some form of exceptions dictionary
is necessary. Given that all systems will provide such a lexicon, there are two
choices that deal with the nonexceptional words. On one extreme, system desig-
ners could attempt to provide a “complete” word dictionary. Unfortunately, while
the number of words is bounded, new words are constantly invented by productive
processes of compounding (e.g. earthrise and cranapple) and by filling
“accidental gaps” (in the phonological sense) as in brillig. Furthermore, a com-
prehensive word lexicon would have to store all regularly inflected forms, which
places a large burden on the storage required. So a “complete” word lexicon will
not do. This fact has led investigators to consider the other extreme, namely the
provision of a set of letter-to-sound rules that would convert input letter strings to
phonetic segment labels through some sort of scanning and transformation process.
Such rule sets have indeed been constructed (MITalk has an extensive set), and
they are very productive. But difficulties remain. It has been difficult to provide a
high degree of accuracy from these rule sets, leading to increases in the size of the
“exceptions” dictionary. These problems arise in part due to the fact that there is
internal structure in words that must be recognized in order to derive the correct
pronunciation.
Letter-to-sound rules recognize small structures within words in the form of
consonant and vowel clusters. Syllables provide additional structure, but it has not
been possible to reliably and consistently find syllable boundaries in the letter
string. The minimum syntactic unit of a language, however, is the morpheme, and
it has an important role to play in the determination of pronunciations. It will also
be seen that when morphemes are represented by letter string segments called
“morphs”, they can be effectively used as the basis for determining word pronun-
ciation. MITalk uses a morph lexicon that can be viewed as a bridge between the
two extreme approaches cited above. Together with an effective analysis proce-
dure, this lexicon provides for accurate pronunciations, including exceptions, and
also provides a natural role for letter-to-sound rules which must be present in order
to convert unrestricted English text to speech.
Roughly speaking, morphs consist of prefixes, roots, and suffixes. An English
word always has at least one root, but may have additional roots as well as prefixes
and suffixes. Thus snow is a single morph, but snowplow is a compound of two
morphs, and snowplows has two roots and an inflectional suffix providing the
plural marker; relearn has a prefix as well as a root, and
antidisestablishmentarianism has no fewer than seven recognizable morphs.
These morphs are the atomic constituents of words, and they are relatively stable
24