from-text-to-speech-the-mit.../pages-txt/035.txt

3

Morphological analysis

3.1 Overview

MITalk is designed to convert unrestricted English text into a synthetic speech
waveform. In the initial analysis phase, text character strings are converted to a
narrow phonetic transcription consisting of phonetic symbols and prosodic
markers. While the output unit types are thus specified, the question remains as to
the type of unit to be used with the input character string. Since there is an infinite
number of possible English sentences, it is not possible to store all English sen-
tences and their corresponding phonetic transcriptions in a form suitable for the
synthesis phase of MITalk. The next smaller unit recognizable from the input
string is the word. The number of English words is large, but bounded, so one
might consider use of a word lexicon which would contain the spelling and
phonetic transcription (together with part-of-speech information) for all English
words. Aside from the size of this dictionary, there are several attractive features
of this approach. Some form of dictionary must be used to provide pronunciations
for exceptions to other mechanisms (e.g. rules) used to derive pronunciations.
These arise in part from foreign words that have retained the pronunciation of their
language of origin (e.g. parfait and tortilla). Furthermore, all mechanisms
derived thus far for the conversion of letter strings to phonetic segment labels
provide some errors, and it seems to be inherent in natural languages that no for-
mal means derived for the representation of their structure has covered all ob-
served forms without error. An interesting class of exceptional pronunciation
arises for high-frequency words. Initial th is pronounced as a voiceless fricative in
many words (thin, thesis, thimble) but for very frequent words, such as the short
function words (the, this, there, these, those, etc.), it is pronounced in a voiced
manner. Similarly, f is always pronounced as an unvoiced fricative, except for the
single case of. In words such as shave and behave, the final silent e has the effect
of lengthening or tensing the preceding vowel, but in the frequent word have this
is not the case. Finally, the final s in atlas and canvas is unvoiced, but for the
function words (is, was, has) it is voiced. It thus appears that these high-frequency
words should be placed in an exceptions dictionary if a set of rules is to be used for
converting letter strings to phonetic segment labels.

23