You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
46 lines
2.8 KiB
46 lines
2.8 KiB
From text to speech: The MITalk system
|
|
|
|
analysis algorithm, some were considered to be either rare or difficult to imple-
|
|
ment. Spelling changes which are particularly difficult to recognize are those in
|
|
which a letter is either added or omitted. These changes frequently appear to have
|
|
been made because of simplified pronunciations. In some cases, a vowel is
|
|
dropped, as in administer/administration. In other cases, repeated consecutive
|
|
sounds are omitted as in quietude (quiet+itude). Words in which letters are in-
|
|
serted may contain an extra sound as in fixture (fix+ure) and armament
|
|
(arm+ment), or simply an extra letter as in picnicked (picnic+ed) and stabilize
|
|
(stable+ize) in which the spelling change allows retention of the original pronun-
|
|
ciation.
|
|
|
|
There are about 250 words in the morph lexicon which, if they were not lex-
|
|
ical entries, would be analyzed by the algorithm into morphs other than those from
|
|
which they are derived. These are the words mentioned in the second category
|
|
above. The word colonize, for example, is not derived from colon; cobweb is not
|
|
derived from cob; bargain is not derived from bar and gain.
|
|
|
|
In some cases of multiple coverings, the selectional rules do not choose the
|
|
correct analysis. For example, the word coppery may be analyzed as either
|
|
cop+ery or as copper+y. In both cases, the morph types are the same: cop and
|
|
copper are free roots, and ery and y are vocalic derivational suffixes. That is, the
|
|
number of morphs and their types are exactly the same in the two possible
|
|
analyses. When this situation arises, the selectional rules are constrained to choose
|
|
the first analysis. Because the algorithm first searches for the longest morph from
|
|
the right end of the word, cop+ery is chosen. This analysis is etymologically in-
|
|
correct, and the polymorphemic word coppery is, therefore, a lexical entry.
|
|
|
|
There are many polymorphemic words in English which differ in pronuncia-
|
|
tion from that of their constituent morphs. For this reason, the third category
|
|
above is rather large; it includes about 8 percent of the lexical entries. Some
|
|
polymorphemic words differ in both pronunciation and stress, the two categories
|
|
being highly interrelated.
|
|
|
|
The part of speech of a word is very important in text-to-speech processing.
|
|
It is used in determining a parse for a sentence which is then used in algorithms
|
|
determining fundamental frequency and duration. DECOMP includes a part-of-
|
|
speech processor which determines the part of speech of a word based on infor-
|
|
mation associated with the component morphs in the lexicon. The procedure will
|
|
be described in detail in the next chapter. If the part of speech of a word is not
|
|
correctly predicted by its constituent morphs, then the entire word must be placed
|
|
in the lexicon. For example, the suffix er is marked as forming adjectives, adverbs
|
|
|
|
38
|