You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
44 lines
2.9 KiB
44 lines
2.9 KiB
Morphological analysis
|
|
|
|
in a language. They are often the ingredients of newly coined compound words,
|
|
but new morphs are rarely formed. For this reason, they are good candidates for
|
|
lexical entries, provided a means can be found to analyze words into their con-
|
|
stituent morphs. As will be seen, an effective morph lexicon can have less than
|
|
10,000 entries, so that reasonable storage efficiency is provided, particularly in
|
|
contemporary integrated circuit technology. It is also important to note that with a
|
|
morph lexicon and associated analysis procedure, there is no need to store all of
|
|
the regularly inflected forms, as is the case with a whole word lexicon.
|
|
|
|
Because morphs are the basic constituents of words, it is important to show
|
|
their utility in determining pronunciations. When morphs are joined together, they
|
|
often change pronunciation depending on the nature of the morphs involved.
|
|
Thus, when the plural form of the singular nouns dog and cat is realized, the final
|
|
s is voiced in dogs but unvoiced in cats. This is a form of morphophonemic rule
|
|
having to do with the realization of the plural morpheme in various environments.
|
|
In order to use these rules, it is necessary to recognize the constituent morphemes
|
|
of a word, so it is apparent that there is an important class of pronunciation effects
|
|
facilitated through the detection of morphs and their boundaries. MITalk provides
|
|
a comprehensive implementation of the morphophonemic rules of English.
|
|
|
|
In addition to the importance of morphophonemic rules, morphs serve to
|
|
break up a word for purposes of pronunciation. This observation is important for
|
|
the proper utilization of letter-to-sound rules. Most sets of letter-to-sound rules
|
|
treat each word as an unstructured sequence of letters, and use a scanning window
|
|
to find consonant and vowel letter clusters that can be readily converted to
|
|
phonetic segment labels. Thus, as we have already seen, th is a letter cluster cor-
|
|
responding to a single fricative phonetic segment, as in thesis. But in the word
|
|
hothouse, the th cluster is broken up by a morph boundary, and no medial frica-
|
|
tive is present. Similarly, the letter cluster sch has a regular pronunciation in
|
|
school and scheme, but in the words mischance and discharge the cluster is
|
|
broken up by the internal morph boundary. In English, the vowel digraph ea
|
|
presents many difficulties for a letter-to-sound algorithm, but in the word
|
|
changeable it is clearly broken up. In essence, the morph structure is essential to
|
|
provide the correct pronunciation. These cases can of course be treated as excep-
|
|
tions, but this will increase the size of the lexicon unnecessarily, and it is also clear
|
|
that important generalities will be lost. In the MITalk system, morph analysis is
|
|
always attempted before letter-to-sound rules are used, and care is taken to ensure
|
|
that letter-to-sound rules are not applied across morph boundaries. Thus, not only
|
|
does the use of morphs lead to an efficient and productive lexicon, it also naturally
|
|
|
|
25
|