|
|
From text to speech: The MITalk system
|
|
|
|
|
|
“Preprocessor” (Allen, 1968). The current algorithm goes right-to-left across the
|
|
|
morphs and uses the part of speech of the rightmost morph for a compound, as
|
|
|
well as for cases where there is a suffix. This is justified by two facts:
|
|
|
|
|
|
1. suffixes (especially the rightmost suffix since it is outermost in the
|
|
|
“nesting” of affixes) determine the part of speech of a word with
|
|
|
regularity (e.g. ...ness is a NOUN);
|
|
|
|
|
|
2. the part of speech of compounds is very idiosyncratic (in fact, it is
|
|
|
usually determined by semantic rather than syntactic information)
|
|
|
and the best heuristic is to use the part-of-speech set of the rightmost
|
|
|
|
|
|
root.
|
|
|
|
|
|
A complete description of the part-of-speech processor is given in Appendix
|
|
|
A. First, the processor checks to see if there was a decomposition. If there is
|
|
|
none, then it calls a routine which assigns the part-of-speech set (NOUN (NUM
|
|
|
SING), VERB (PL TR) (INF TR), ADJ) unless the word ends in ’S in which case
|
|
|
the part-of-speech set is (NOUN (POSS TR), NOUN (NUM SING) (CONTR
|
|
|
TR)). Next, the program determines whether the last morph in the decomposition
|
|
|
is a suffix. If it is not, then the program checks for the part-of-speech determining
|
|
|
prefixes. The prefixes EM, EN, and BE indicate that a word is a VERB, while A
|
|
|
gives the part-of-speech set (ADJ, ADV). (Suffixes have priority over these, as in
|
|
|
befuddlement.) If none of these are present, then the processor assigns the part-
|
|
|
of-speech set of the last morph in the decomposition.
|
|
|
|
|
|
The rest of the processor is essentially a dispatch on the last suffix. In many
|
|
|
cases, the next to last morph’s part of speech must also be examined. If the last
|
|
|
morph is the suffix ING, the part of speech is specified as VERBING, while ED
|
|
|
indicates that the part-of-speech set is (VERBEN, VERB (SING TR) (PL TR)). If
|
|
|
the last morph is S or ES, a number of checks must be made. If the next to last
|
|
|
morph is not a suffix and there is a verb-producing prefix, then the part of speech
|
|
|
is VERB (SING TR), as in entitles. If the penultimate morph has the part of
|
|
|
speech VERB, then the same part of speech is assigned. If the previous morph is a
|
|
|
NOUN, ADJ, or INTG or is ER or ING, then the part of speech NOUN (NUM
|
|
|
PL) is added to the set. If the next to last morph is an ORD, then the part of speech
|
|
|
is also ORD (NUM PL). Finally, if there is still no part of speech, the processor
|
|
|
assigns NOUN (NUM PL), as in the whys and wherefores.
|
|
|
|
|
|
If the last suffix is ER, then three checks are made. If the next to last morph
|
|
|
has the part of speech ADV, then the word is a comparative adverb; if it is an AD]J,
|
|
|
then the word is a comparative adjective. If it is a NOUN or a VERB, then the
|
|
|
word is a singular NOUN, as in worker. If the last morph is S’, then the word’s
|
|
|
part of speech is NOUN with the property (POSS TR).
|
|
|
|
|
|
44
|