from-text-to-speech-the-mit.../pages-txt/015.txt

Preface

necessary data to support the extensions and refinements to the morph analysis
routines. Subsequent to the initial construction of the lexicon, an elaborate editing
of all entries was made by M. S. Hunnicutt. This led to substantial improvements
in the system’s overall performance.

When words could not be found in the morph lexicon, or could not be
analyzed into morphs from the lexicon, letter-to-sound rules were utilized. Prior to
the MITalk research, letter-to-sound rules had been proposed to cover the entire
language. But, with MITalk, it was realized that high-frequency function words
often violate perspicuous forms of these rules, and that such letter-to-sound rules
do not span morph boundaries. Based on these observations, a complementary set
of letter-to-sound rules could be introduced into MITalk, but these rules would not
be used unless morph analysis failed. Realizing this fact, affix stripping was util-
ized, and the more reliable consonants were converted first, leaving the vowels for
last. This approach was proposed by J. Allen, and extensive sets of these rules
were developed by M. S. Hunnicutt working with F. X. Carroll. Several sets of
these rules were developed and elaborately tested. In addition, in the late *60s at
MIT, there was great interest in lexical stress and phonological rules for this pur-
pose which were initially developed by M. Halle and N. Chomsky. These rules
were reformulated and extended to include the effect of affixes. This was the first
time that lexical stress rules had been used in a text-to-speech system. The
development of rules for this purpose, along with their unification with the letter-
to-sound rules, was accomplished by M. S. Hunnicutt. In addition, the text
preprocessing rules were also provided by M. S. Hunnicutt, as well as the routines
for morphophonemics and stress adjustment used in conjunction with the morph
analysis.

In a 1968 doctoral thesis, J. Allen developed a parsing methodology for use in
a text-to-speech system, with particular emphasis on the computation of necessary
syntactic markers to specify prosodic comrelates. This parsing strategy led to the
development of a phrase-level parser which avoided the complications of clause-
level parsing and the problems of syntactic ambiguity at that level, but also led to
the introduction of inaccuracy due to incomplete clause-level analyses. This ap-
proach was augmented and extended by P. L. Miller and C. J. Drake, and was
tested extensively in the context of the morph lexicon and analysis routines.

In light of the phonetic segment labels, stress marks, and syntactic markers
obtained by the previously mentioned programs, it was necessary to develop a
prosodic framework for the following phonemic synthesis. A durational
framework was developed by D. H. Klatt together with R. Carlson and