You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
46 lines
2.9 KiB
46 lines
2.9 KiB
From text to speech: The MITalk system
|
|
|
|
From the above discussion, it is clear that some form of exceptions dictionary
|
|
is necessary. Given that all systems will provide such a lexicon, there are two
|
|
choices that deal with the nonexceptional words. On one extreme, system desig-
|
|
ners could attempt to provide a “complete” word dictionary. Unfortunately, while
|
|
the number of words is bounded, new words are constantly invented by productive
|
|
processes of compounding (e.g. earthrise and cranapple) and by filling
|
|
“accidental gaps” (in the phonological sense) as in brillig. Furthermore, a com-
|
|
prehensive word lexicon would have to store all regularly inflected forms, which
|
|
places a large burden on the storage required. So a “complete” word lexicon will
|
|
not do. This fact has led investigators to consider the other extreme, namely the
|
|
provision of a set of letter-to-sound rules that would convert input letter strings to
|
|
phonetic segment labels through some sort of scanning and transformation process.
|
|
Such rule sets have indeed been constructed (MITalk has an extensive set), and
|
|
they are very productive. But difficulties remain. It has been difficult to provide a
|
|
high degree of accuracy from these rule sets, leading to increases in the size of the
|
|
“exceptions” dictionary. These problems arise in part due to the fact that there is
|
|
internal structure in words that must be recognized in order to derive the correct
|
|
pronunciation.
|
|
|
|
Letter-to-sound rules recognize small structures within words in the form of
|
|
consonant and vowel clusters. Syllables provide additional structure, but it has not
|
|
been possible to reliably and consistently find syllable boundaries in the letter
|
|
string. The minimum syntactic unit of a language, however, is the morpheme, and
|
|
it has an important role to play in the determination of pronunciations. It will also
|
|
be seen that when morphemes are represented by letter string segments called
|
|
“morphs”, they can be effectively used as the basis for determining word pronun-
|
|
ciation. MITalk uses a morph lexicon that can be viewed as a bridge between the
|
|
two extreme approaches cited above. Together with an effective analysis proce-
|
|
dure, this lexicon provides for accurate pronunciations, including exceptions, and
|
|
also provides a natural role for letter-to-sound rules which must be present in order
|
|
|
|
to convert unrestricted English text to speech.
|
|
Roughly speaking, morphs consist of prefixes, roots, and suffixes. An English
|
|
|
|
word always has at least one root, but may have additional roots as well as prefixes
|
|
|
|
and suffixes. Thus snow is a single morph, but snowplow is a compound of two
|
|
morphs, and snowplows has two roots and an inflectional suffix providing the
|
|
plural marker; relearn has a prefix as well as a root, and
|
|
antidisestablishmentarianism has no fewer than seven recognizable morphs.
|
|
These morphs are the atomic constituents of words, and they are relatively stable
|
|
|
|
24
|