You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

67 lines
2.6 KiB

Morphological analysis
Decomp: "SCARCITY" [state = word <0> inflectional suffix] =>
DECOMP ITY [DERIVATIONAL VOCALIC SUFFIX] : second morph
DECOMP : *T-E~ (NOQUN)
Decomp: Matched "CITY" (root) -- decompose remainder
Decomp: "SCAR" [state = <101> root] =>
Decomp: Matched "SCAR" (root) -- decompose remainder
Decomp: "" [state = <234> root] =>
Decomp: Matched start of word, final score = 234
Decomp: Matched "CAR" (root) min. score = 268 -- too expen-
sive!
Decomp: Matched "AR" (derivational suffix) min. score = 234
-—- too expensive!
Decomp: Matched "ITY" (derivational suffix) -- decompose
remainder
Decomp: "SCARCE" [state = root <35> derivational suffix] =>
Decomp: Matched "SCARCE" (root) -- decompose remainder
Decomp: "t [state = <136> root] =>
Decomp: Matched start of word, final score = 136
Decomp: "SCARC" [state = root <35> derivational suffix] =>
Decomp: Matched "ARC" (root) min. score = 170 -- too expen-
sive!
Decomp: "SCARCY" [state = root <35> derivational suffix] =>
Decomp: Matched "Y" (derivational suffix) min. score = 136 --
too expensive!
Decomp: Matched "Y" (derivational suffix) -- decompose
remainder
Decomp: "SCARCITE" [state = root <35> derivational suffix] =>
Decomp: Matched "CITE" (root) min. score = 170 -- too expen-
sive!
Decomp: "SCARCIT" [state = root <35> derivational suffix] =>
Decomp: Matched "IT" (absolute) -- illegal!
DECOMP: SCARCITY word spelling
DECOMP : NOUN (NUMBER = SINGULAR) part of speech and features
DECOMP: => decomposition follows
DECOMP : SCARCE [ROOT] : first morph spelling and type
DECOMP : 1SKE*RS (ADJECTIVE) pronunciation and part of speech
Figure 3-2: Decomposition of “scarcity”
The morph lexicon was obtained by decomposing 50,406 distinct words
found in a corpus of 1,014,232 words of running text into constituent morphs
(Kucera and Francis, 1967). Beginning with a base of one-, two-, and three-letter
words and a decomposition (analysis) algorithm, the lexicon was built up by suc-
cessively adding to the base all n-letter words (starting with n=4) which either:
1. did not decompose into words of less than n letters,
2. decomposed into incorrect constituent morphs,
3. had a pronunciation other than that obtained by concatenation of the
pronunciations of the individual morphs, or
4. had a part of speech which was not derivable from the part-of-speech
sets of its constituent morphs.
The first category includes n-letter words consisting of a single morph, words
whose constituent morphs did not appear in the lexicon, and words in which an
unrecognized spelling change prevented correct analysis.
Although many spelling changes are recognized by the morphological
37