You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
52 lines
2.3 KiB
52 lines
2.3 KiB
Morphological analysis
|
|
|
|
find a set of possible spelling changes! at the right end of the
|
|
remainder,
|
|
|
|
attempt a recursive decomposition for each spelling variation,
|
|
|
|
save the results of the best-scoring of these variations,
|
|
|
|
restore the remainder string, state, and score to their original
|
|
|
|
values.
|
|
ENDIF,
|
|
find the next longest morph which matches the right end of the string.
|
|
END WHILE.
|
|
|
|
The decision to search from the right end of the word was made early in the
|
|
development of the system before the selectional rules were implemented. It was
|
|
observed that the best decomposition was found first by stripping off suffixes be-
|
|
fore searching for roots and prefixes. When a later algorithm was developed in
|
|
which all decompositions were found and a choice made, the strategy was retained.
|
|
Since only the decomposition with the best score is kept while searching for other
|
|
possible morph coverings, finding the best decomposition early in the search is still
|
|
more efficient; potential coverings with worse scores can be discarded as early as
|
|
possible.
|
|
|
|
3.4.2 Morph types
|
|
|
|
Not all sequences of morphs are legal in the English language. For this reason
|
|
(and later, for scoring multiple coverings) each morph in the lexicon has a type
|
|
code. These morph type codes refine the coarse categories of “prefix”, “suffix”,
|
|
and “root” to obtain better performance in finding the correct covering.
|
|
|
|
The morph type “FREE ROOT” (or simply “ROOT”) denotes a word which
|
|
can appear alone or with suffixes, prefixes, and/or other ROOTs. Typical ROOTs
|
|
are: side, cover, and spell. The type “ABSOLUTE” is assigned to words which do
|
|
not allow most affixes (suffixes or prefixes). These are words such as the, into, of,
|
|
and proper names. (The few affixes permitted are the inflectional suffixes such as
|
|
plural and possessive forms.) This type is essential in preventing DECOMP from
|
|
attempting to match the morphs a and I in many words.
|
|
|
|
Most prefixes have the type “PREFIX” that denotes a prefix which can com-
|
|
bine with roots and other prefixes. Examples are: pre, dis, and mis. The remain-
|
|
ing prefixes can only occur at the beginning of a word and are classified as
|
|
|
|
“INITIAL”. Examples are meta and centi.
|
|
Suffixes are classified using two different criteria yielding a total of four
|
|
|
|
INote that unchanged spelling is always one of these possibilities.
|
|
|
|
29
|