You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

50 lines
2.0 KiB

From text to speech: The MITalk system
state, a picture of the input stream is shown using the metalanguage of the gram-
mar above and with “<>" marking the position in the stream represented by the
state. To the right of the marker is context represented by the state. To the left, is
an expression representing the expected structure of the remainder of the word.
FO word < {INFL {suffix}}
RO (affixed-word | LF-ROOT) <> DERIV {suffix}
R1 (affixed-word | LF-ROOT) <> DERIV effective-root
M1 PREFIX <> RF-ROOT ({suffix}
L1 {affixed-word | PREFIX | INITIAL} <> effective-root {suffix}
LO {affixed-word | PREFIX | INITIAL} <> PREFIX effective-root {suffix}
I0 {word HYPHEN} <> (ABSOLUTE | INITIAL affixed-word)
3.44 Selectional rules and scoring
When multiple morph coverings are found, selectional rules are needed to choose
the covering most likely to be correct. For example, a means of favoring
form+al+ly (ROOT + DERIV + DERIV) over form+ally (ROOT + ROOQOT) as the
decomposition of formally is needed. A set of derivational rules was devised by
examining all of the multiple coverings produced by DECOMP during the
development of the morph lexicon. The first result of this study was the discovery
of the so-called “standard form” for a (possibly compound) word stated below as
two productions:
std-root = (ROOT | LF-ROOT DERIYV)
std-form = {PREFIX} {std-root} (std-root {DERIV} | STRONG) {INFL}
Coverings which match this form are to be preferred above all others.
Among coverings that match the standard form, the following partial order-
ings were found (“>" means that the pattern on the left is more desirable):
ROOT > anything else
PREFIX+ROOT > ROOT+DERIV > ROOT+INFL > ROOT+ROOT
PREFIX+PREFIX+ROOT > ROOT+ROOT
ROOT+DERIV+DERIV > ROOT+ROOT
These rules are implemented by associating a cost with each transition of the
FSM and keeping track of the total cost of the decomposition as morphs are
stripped off the word. This cost is the “score value” mentioned above in the algo-
rithm description. The covering with the lowest total cost is the most desirable.
32