You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

58 lines
2.1 KiB

From text to speech: The MITalk system
3.4 The algorithm
The goal of the decomposition process is to obtain a morph covering of a word.
The word “covering” is used to indicate that a simple concatenation of morph
spellings will not, in many cases, provide a correct analysis. It is sometimes the
case, particularly when a vocalic suffix is involved, that spelling changes occur at
morph boundaries. In addition, there may be several distinct coverings of a given
word. |
In light of the observations above, the decomposition algorithm consists of
three major components:
1. a recursive morph partitioning algorithm,
2. a set of spelling change rules for use at morph boundaries, and
3. a set of selectional rules to distinguish between legal and illegal
morph sequences and to choose the best covering when multiple
legal coverings exist.
These components are described in detail below.
3.4.1 Recursive morph decomposition
The overall control structure of the decomposition procedure is recursive. At each
step in the recursion, the right end of the word is matched against the longest lex-
icon morph possible, then the procedure is recursively invoked on the remaining
“uncovered” portion of the word. If this recursive invocation fails to produce a
covering, then the original match is discarded and the next longest matching morph
is used.
Input to the decomposition procedure consists of:
1. a word or remainder to be covered,
2. a state flag that indicates which morph types are legal in the current
context, and
3. a score value that is used to rank multiple decompositions according
to their likelihood of being correct.
Initially, the entire word is presented as input, the state flag is set to a value
indicating that no morphs have been found yet, and the score is set to zero. A con-
cise informal description of the procedure follows:
find the longest morph which matches the right end of the current string
WHILE there is a match DO
IF the matching morph is compatible with the current context (state)
THEN remove the matched letters from the right side of the string,
update the current state and score as a function of the type of
the matched morph.
28