You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
67 lines
2.6 KiB
67 lines
2.6 KiB
Morphological analysis
|
|
|
|
Decomp: "SCARCITY" [state = word <0> inflectional suffix] =>
|
|
|
|
DECOMP ITY [DERIVATIONAL VOCALIC SUFFIX] : second morph
|
|
DECOMP : *T-E~ (NOQUN)
|
|
|
|
Decomp: Matched "CITY" (root) -- decompose remainder
|
|
Decomp: "SCAR" [state = <101> root] =>
|
|
Decomp: Matched "SCAR" (root) -- decompose remainder
|
|
Decomp: "" [state = <234> root] =>
|
|
Decomp: Matched start of word, final score = 234
|
|
Decomp: Matched "CAR" (root) min. score = 268 -- too expen-
|
|
sive!
|
|
Decomp: Matched "AR" (derivational suffix) min. score = 234
|
|
-—- too expensive!
|
|
Decomp: Matched "ITY" (derivational suffix) -- decompose
|
|
remainder
|
|
Decomp: "SCARCE" [state = root <35> derivational suffix] =>
|
|
Decomp: Matched "SCARCE" (root) -- decompose remainder
|
|
Decomp: "t [state = <136> root] =>
|
|
Decomp: Matched start of word, final score = 136
|
|
Decomp: "SCARC" [state = root <35> derivational suffix] =>
|
|
Decomp: Matched "ARC" (root) min. score = 170 -- too expen-
|
|
sive!
|
|
Decomp: "SCARCY" [state = root <35> derivational suffix] =>
|
|
Decomp: Matched "Y" (derivational suffix) min. score = 136 --
|
|
too expensive!
|
|
Decomp: Matched "Y" (derivational suffix) -- decompose
|
|
remainder
|
|
Decomp: "SCARCITE" [state = root <35> derivational suffix] =>
|
|
Decomp: Matched "CITE" (root) min. score = 170 -- too expen-
|
|
sive!
|
|
Decomp: "SCARCIT" [state = root <35> derivational suffix] =>
|
|
Decomp: Matched "IT" (absolute) -- illegal!
|
|
DECOMP: SCARCITY word spelling
|
|
DECOMP : NOUN (NUMBER = SINGULAR) part of speech and features
|
|
DECOMP: => decomposition follows
|
|
DECOMP : SCARCE [ROOT] : first morph spelling and type
|
|
DECOMP : 1SKE*RS (ADJECTIVE) pronunciation and part of speech
|
|
|
|
Figure 3-2: Decomposition of “scarcity”
|
|
|
|
The morph lexicon was obtained by decomposing 50,406 distinct words
|
|
found in a corpus of 1,014,232 words of running text into constituent morphs
|
|
(Kucera and Francis, 1967). Beginning with a base of one-, two-, and three-letter
|
|
words and a decomposition (analysis) algorithm, the lexicon was built up by suc-
|
|
cessively adding to the base all n-letter words (starting with n=4) which either:
|
|
|
|
1. did not decompose into words of less than n letters,
|
|
|
|
2. decomposed into incorrect constituent morphs,
|
|
3. had a pronunciation other than that obtained by concatenation of the
|
|
|
|
pronunciations of the individual morphs, or
|
|
4. had a part of speech which was not derivable from the part-of-speech
|
|
|
|
sets of its constituent morphs.
|
|
The first category includes n-letter words consisting of a single morph, words
|
|
|
|
whose constituent morphs did not appear in the lexicon, and words in which an
|
|
|
|
unrecognized spelling change prevented correct analysis.
|
|
Although many spelling changes are recognized by the morphological
|
|
|
|
37
|