You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
53 lines
2.4 KiB
53 lines
2.4 KiB
Morphological analysis
|
|
|
|
In Figure 3-1 the transition arcs are labeled with the associated incremental
|
|
cost as well as morph type. The specific cost values are not significant, only their
|
|
relative values. The values were chosen to cause the FSM to implement the rules
|
|
above, then fine-tuned to get the best overall performance. The cost of a standard-
|
|
form covering is easily computed and is the sum of the following:
|
|
|
|
¢ 34 units for each PREFIX,
|
|
|
|
e 101 units for the first effective-root and 133 units for each additional
|
|
effective-root (if the rightmost effective-root is STRONG, add an ex-
|
|
tra 64 units to account for the “hidden” inflectional morpheme),
|
|
|
|
¢ 35 units for each DERIV, and
|
|
|
|
¢ 64 units for each INFL.
|
|
|
|
The only other notable feature of the scoring is that any transition not part of the
|
|
standard form incurs a 512-unit penalty. In order to allow a single ABSOLUTE
|
|
root to match a word, the penalty is suppressed for this case and the cost is taken to
|
|
be the same as for a single ROOT covering.
|
|
|
|
The recursive procedure takes advantage of the cost information to reduce the
|
|
number of matching operations. The cost of the best complete covering found be-
|
|
fore the current step in the recursion is recorded. As a new morph is matched, the
|
|
cost of its associated transition in the FSM is added to the running score. In ad-
|
|
dition, the minimum possible cost for the decomposition of the remainder is also
|
|
computed. If the sum of this cost and the current cost is not less than the best cost
|
|
so far, then the new morph is immediately rejected as being too expensive.
|
|
|
|
3.4.5 Recognizing morphological mutations
|
|
After a suffix morph has been removed from a word, it is necessary to investigate
|
|
possible spelling changes which may have taken place during composition. Typi-
|
|
cal spelling changes (during morph composition) are:
|
|
|
|
ey — i (embody+ment — embodiment),
|
|
|
|
e consonant doubling before a vocalic suffix (pad+ing — padding),
|
|
|
|
and
|
|
|
|
e dropping of “silent e” before a vocalic suffix (fire+ing — firing).
|
|
|
|
Different morphs have differing behavior in the presence of change-causing
|
|
suffixes. Three general categories of morph behavior are provided for in
|
|
DECOMP. In the lexicon, each morph has a spelling change code which indicates
|
|
whether spelling changes are forbidden, required, or optional when the morph is
|
|
combined with a suffix. The “required” category is currently used exclusively for
|
|
morphs with consonant endings which are always doubled in the presence of a
|
|
|
|
33
|