You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

49 lines
2.8 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

From text to speech: The MITalk system
“Preprocessor” (Allen, 1968). The current algorithm goes right-to-left across the
morphs and uses the part of speech of the rightmost morph for a compound, as
well as for cases where there is a suffix. This is justified by two facts:
1. suffixes (especially the rightmost suffix since it is outermost in the
“nesting” of affixes) determine the part of speech of a word with
regularity (e.g. ...ness is a NOUN);
2. the part of speech of compounds is very idiosyncratic (in fact, it is
usually determined by semantic rather than syntactic information)
and the best heuristic is to use the part-of-speech set of the rightmost
root.
A complete description of the part-of-speech processor is given in Appendix
A. First, the processor checks to see if there was a decomposition. If there is
none, then it calls a routine which assigns the part-of-speech set (NOUN (NUM
SING), VERB (PL TR) (INF TR), ADJ) unless the word ends in S in which case
the part-of-speech set is (NOUN (POSS TR), NOUN (NUM SING) (CONTR
TR)). Next, the program determines whether the last morph in the decomposition
is a suffix. If it is not, then the program checks for the part-of-speech determining
prefixes. The prefixes EM, EN, and BE indicate that a word is a VERB, while A
gives the part-of-speech set (ADJ, ADV). (Suffixes have priority over these, as in
befuddlement.) If none of these are present, then the processor assigns the part-
of-speech set of the last morph in the decomposition.
The rest of the processor is essentially a dispatch on the last suffix. In many
cases, the next to last morphs part of speech must also be examined. If the last
morph is the suffix ING, the part of speech is specified as VERBING, while ED
indicates that the part-of-speech set is (VERBEN, VERB (SING TR) (PL TR)). If
the last morph is S or ES, a number of checks must be made. If the next to last
morph is not a suffix and there is a verb-producing prefix, then the part of speech
is VERB (SING TR), as in entitles. If the penultimate morph has the part of
speech VERB, then the same part of speech is assigned. If the previous morph is a
NOUN, ADJ, or INTG or is ER or ING, then the part of speech NOUN (NUM
PL) is added to the set. If the next to last morph is an ORD, then the part of speech
is also ORD (NUM PL). Finally, if there is still no part of speech, the processor
assigns NOUN (NUM PL), as in the whys and wherefores.
If the last suffix is ER, then three checks are made. If the next to last morph
has the part of speech ADV, then the word is a comparative adverb; if it is an AD]J,
then the word is a comparative adjective. If it is a NOUN or a VERB, then the
word is a singular NOUN, as in worker. If the last morph is S, then the words
part of speech is NOUN with the property (POSS TR).
44