You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

44 lines
2.9 KiB

Morphological analysis
in a language. They are often the ingredients of newly coined compound words,
but new morphs are rarely formed. For this reason, they are good candidates for
lexical entries, provided a means can be found to analyze words into their con-
stituent morphs. As will be seen, an effective morph lexicon can have less than
10,000 entries, so that reasonable storage efficiency is provided, particularly in
contemporary integrated circuit technology. It is also important to note that with a
morph lexicon and associated analysis procedure, there is no need to store all of
the regularly inflected forms, as is the case with a whole word lexicon.
Because morphs are the basic constituents of words, it is important to show
their utility in determining pronunciations. When morphs are joined together, they
often change pronunciation depending on the nature of the morphs involved.
Thus, when the plural form of the singular nouns dog and cat is realized, the final
s is voiced in dogs but unvoiced in cats. This is a form of morphophonemic rule
having to do with the realization of the plural morpheme in various environments.
In order to use these rules, it is necessary to recognize the constituent morphemes
of a word, so it is apparent that there is an important class of pronunciation effects
facilitated through the detection of morphs and their boundaries. MITalk provides
a comprehensive implementation of the morphophonemic rules of English.
In addition to the importance of morphophonemic rules, morphs serve to
break up a word for purposes of pronunciation. This observation is important for
the proper utilization of letter-to-sound rules. Most sets of letter-to-sound rules
treat each word as an unstructured sequence of letters, and use a scanning window
to find consonant and vowel letter clusters that can be readily converted to
phonetic segment labels. Thus, as we have already seen, th is a letter cluster cor-
responding to a single fricative phonetic segment, as in thesis. But in the word
hothouse, the th cluster is broken up by a morph boundary, and no medial frica-
tive is present. Similarly, the letter cluster sch has a regular pronunciation in
school and scheme, but in the words mischance and discharge the cluster is
broken up by the internal morph boundary. In English, the vowel digraph ea
presents many difficulties for a letter-to-sound algorithm, but in the word
changeable it is clearly broken up. In essence, the morph structure is essential to
provide the correct pronunciation. These cases can of course be treated as excep-
tions, but this will increase the size of the lexicon unnecessarily, and it is also clear
that important generalities will be lost. In the MITalk system, morph analysis is
always attempted before letter-to-sound rules are used, and care is taken to ensure
that letter-to-sound rules are not applied across morph boundaries. Thus, not only
does the use of morphs lead to an efficient and productive lexicon, it also naturally
25