You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

46 lines
2.8 KiB

From text to speech: The MITalk system
analysis algorithm, some were considered to be either rare or difficult to imple-
ment. Spelling changes which are particularly difficult to recognize are those in
which a letter is either added or omitted. These changes frequently appear to have
been made because of simplified pronunciations. In some cases, a vowel is
dropped, as in administer/administration. In other cases, repeated consecutive
sounds are omitted as in quietude (quiet+itude). Words in which letters are in-
serted may contain an extra sound as in fixture (fix+ure) and armament
(arm+ment), or simply an extra letter as in picnicked (picnic+ed) and stabilize
(stable+ize) in which the spelling change allows retention of the original pronun-
ciation.
There are about 250 words in the morph lexicon which, if they were not lex-
ical entries, would be analyzed by the algorithm into morphs other than those from
which they are derived. These are the words mentioned in the second category
above. The word colonize, for example, is not derived from colon; cobweb is not
derived from cob; bargain is not derived from bar and gain.
In some cases of multiple coverings, the selectional rules do not choose the
correct analysis. For example, the word coppery may be analyzed as either
cop+ery or as copper+y. In both cases, the morph types are the same: cop and
copper are free roots, and ery and y are vocalic derivational suffixes. That is, the
number of morphs and their types are exactly the same in the two possible
analyses. When this situation arises, the selectional rules are constrained to choose
the first analysis. Because the algorithm first searches for the longest morph from
the right end of the word, cop+ery is chosen. This analysis is etymologically in-
correct, and the polymorphemic word coppery is, therefore, a lexical entry.
There are many polymorphemic words in English which differ in pronuncia-
tion from that of their constituent morphs. For this reason, the third category
above is rather large; it includes about 8 percent of the lexical entries. Some
polymorphemic words differ in both pronunciation and stress, the two categories
being highly interrelated.
The part of speech of a word is very important in text-to-speech processing.
It is used in determining a parse for a sentence which is then used in algorithms
determining fundamental frequency and duration. DECOMP includes a part-of-
speech processor which determines the part of speech of a word based on infor-
mation associated with the component morphs in the lexicon. The procedure will
be described in detail in the next chapter. If the part of speech of a word is not
correctly predicted by its constituent morphs, then the entire word must be placed
in the lexicon. For example, the suffix er is marked as forming adjectives, adverbs
38