You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
58 lines
2.1 KiB
58 lines
2.1 KiB
From text to speech: The MITalk system
|
|
|
|
3.4 The algorithm
|
|
The goal of the decomposition process is to obtain a morph covering of a word.
|
|
The word “covering” is used to indicate that a simple concatenation of morph
|
|
spellings will not, in many cases, provide a correct analysis. It is sometimes the
|
|
case, particularly when a vocalic suffix is involved, that spelling changes occur at
|
|
morph boundaries. In addition, there may be several distinct coverings of a given
|
|
word. |
|
|
|
|
In light of the observations above, the decomposition algorithm consists of
|
|
three major components:
|
|
|
|
1. a recursive morph partitioning algorithm,
|
|
|
|
2. a set of spelling change rules for use at morph boundaries, and
|
|
|
|
3. a set of selectional rules to distinguish between legal and illegal
|
|
|
|
morph sequences and to choose the best covering when multiple
|
|
|
|
legal coverings exist.
|
|
These components are described in detail below.
|
|
|
|
3.4.1 Recursive morph decomposition
|
|
|
|
The overall control structure of the decomposition procedure is recursive. At each
|
|
step in the recursion, the right end of the word is matched against the longest lex-
|
|
icon morph possible, then the procedure is recursively invoked on the remaining
|
|
“uncovered” portion of the word. If this recursive invocation fails to produce a
|
|
|
|
covering, then the original match is discarded and the next longest matching morph
|
|
is used.
|
|
|
|
Input to the decomposition procedure consists of:
|
|
|
|
1. a word or remainder to be covered,
|
|
|
|
2. a state flag that indicates which morph types are legal in the current
|
|
context, and
|
|
|
|
3. a score value that is used to rank multiple decompositions according
|
|
|
|
to their likelihood of being correct.
|
|
Initially, the entire word is presented as input, the state flag is set to a value
|
|
indicating that no morphs have been found yet, and the score is set to zero. A con-
|
|
|
|
cise informal description of the procedure follows:
|
|
find the longest morph which matches the right end of the current string
|
|
WHILE there is a match DO
|
|
|
|
IF the matching morph is compatible with the current context (state)
|
|
THEN remove the matched letters from the right side of the string,
|
|
update the current state and score as a function of the type of
|
|
the matched morph.
|
|
|
|
28
|