You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

53 lines
2.4 KiB

Morphological analysis
In Figure 3-1 the transition arcs are labeled with the associated incremental
cost as well as morph type. The specific cost values are not significant, only their
relative values. The values were chosen to cause the FSM to implement the rules
above, then fine-tuned to get the best overall performance. The cost of a standard-
form covering is easily computed and is the sum of the following:
¢ 34 units for each PREFIX,
e 101 units for the first effective-root and 133 units for each additional
effective-root (if the rightmost effective-root is STRONG, add an ex-
tra 64 units to account for the “hidden” inflectional morpheme),
¢ 35 units for each DERIV, and
¢ 64 units for each INFL.
The only other notable feature of the scoring is that any transition not part of the
standard form incurs a 512-unit penalty. In order to allow a single ABSOLUTE
root to match a word, the penalty is suppressed for this case and the cost is taken to
be the same as for a single ROOT covering.
The recursive procedure takes advantage of the cost information to reduce the
number of matching operations. The cost of the best complete covering found be-
fore the current step in the recursion is recorded. As a new morph is matched, the
cost of its associated transition in the FSM is added to the running score. In ad-
dition, the minimum possible cost for the decomposition of the remainder is also
computed. If the sum of this cost and the current cost is not less than the best cost
so far, then the new morph is immediately rejected as being too expensive.
3.4.5 Recognizing morphological mutations
After a suffix morph has been removed from a word, it is necessary to investigate
possible spelling changes which may have taken place during composition. Typi-
cal spelling changes (during morph composition) are:
ey — i (embody+ment — embodiment),
e consonant doubling before a vocalic suffix (pad+ing — padding),
and
e dropping of “silent e” before a vocalic suffix (fire+ing — firing).
Different morphs have differing behavior in the presence of change-causing
suffixes. Three general categories of morph behavior are provided for in
DECOMP. In the lexicon, each morph has a spelling change code which indicates
whether spelling changes are forbidden, required, or optional when the morph is
combined with a suffix. The “required” category is currently used exclusively for
morphs with consonant endings which are always doubled in the presence of a
33