from-text-to-speech-the-mit.../pages-txt/049.txt

Morphological analysis

Decomp: "SCARCITY" [state = word <0> inflectional suffix] =>

DECOMP ITY [DERIVATIONAL VOCALIC SUFFIX] : second morph
DECOMP : *T-E~ (NOQUN)

Decomp: Matched "CITY" (root) -- decompose remainder
Decomp: "SCAR" [state = <101> root] =>
Decomp: Matched "SCAR" (root) -- decompose remainder
Decomp: "" [state = <234> root] =>
Decomp: Matched start of word, final score = 234
Decomp: Matched "CAR" (root) min. score = 268 -- too expen-
sive!
Decomp: Matched "AR" (derivational suffix) min. score = 234
-—- too expensive!
Decomp: Matched "ITY" (derivational suffix) -- decompose
remainder
Decomp: "SCARCE" [state = root <35> derivational suffix] =>
Decomp: Matched "SCARCE" (root) -- decompose remainder
Decomp: "t [state = <136> root] =>
Decomp: Matched start of word, final score = 136
Decomp: "SCARC" [state = root <35> derivational suffix] =>
Decomp: Matched "ARC" (root) min. score = 170 -- too expen-
sive!
Decomp: "SCARCY" [state = root <35> derivational suffix] =>
Decomp: Matched "Y" (derivational suffix) min. score = 136 --
too expensive!
Decomp: Matched "Y" (derivational suffix) -- decompose
remainder
Decomp: "SCARCITE" [state = root <35> derivational suffix] =>
Decomp: Matched "CITE" (root) min. score = 170 -- too expen-
sive!
Decomp: "SCARCIT" [state = root <35> derivational suffix] =>
Decomp: Matched "IT" (absolute) -- illegal!
DECOMP: SCARCITY word spelling
DECOMP : NOUN (NUMBER = SINGULAR)  part of speech and features
DECOMP: => decomposition follows
DECOMP : SCARCE [ROOT] : first morph spelling and type
DECOMP : 1SKE*RS (ADJECTIVE) pronunciation and part of speech

Figure 3-2: Decomposition of “scarcity”

The morph lexicon was obtained by decomposing 50,406 distinct words
found in a corpus of 1,014,232 words of running text into constituent morphs
(Kucera and Francis, 1967). Beginning with a base of one-, two-, and three-letter
words and a decomposition (analysis) algorithm, the lexicon was built up by suc-
cessively adding to the base all n-letter words (starting with n=4) which either:

1. did not decompose into words of less than n letters,

2. decomposed into incorrect constituent morphs,
3. had a pronunciation other than that obtained by concatenation of the

pronunciations of the individual morphs, or
4. had a part of speech which was not derivable from the part-of-speech

sets of its constituent morphs.
The first category includes n-letter words consisting of a single morph, words

whose constituent morphs did not appear in the lexicon, and words in which an

unrecognized spelling change prevented correct analysis.
Although many spelling changes are recognized by the morphological

37