from-text-to-speech-the-mit.../pages-txt/013.txt

Preface

The MITalk system described in this book is the result of a long effort, stretching
from the early 1960s to the present. In this preface, a view is given of the work’s
historical evolution. Within this description, acknowledgements are made of the
project’s many contributions. In recognizing these contributions, it is best to or-
ganize them into four groups. First, there is the development of the MITalk system
itself, its evolution, and the many diverse contributions made to its structure and
content. Second, there was the 1979 summer course which resulted in a com-
prehensive summary of the work to that date, and also provided the occasion to
write a set of course notes. Next, there have been continuing efforts (since 1980)
which included re-writes of the system’s software, and the efforts to organize this
book which involved substantial new writing and rule formulations, and explicit
examples directly keyed to the current working system. Finally, there is the spon-
sorship of the program’s many facets over the years.

In the early 1960s, much interest in speech synthesis emerged within the Cog-
nitive Information Processing Group at MIT’s Research Laboratory of Electronics.
This group, led by M. Eden and S. J. Mason, focused on the development of sen-
sory aids for the blind. Many approaches were taken, but it was recognized that
the development of a reading machine for the blind that could scan printed text and
produce spoken output was a major goal. Research efforts in both character recog-
nition and speech synthesis from text were initiated. By 1968, a functional reading
machine was demonstrated. Once the characters were recognized (using a contour
scanning algorithm), text-to-speech conversion was accomplished in two phases.
First, a morph decomposition analysis of all words was performed by using tech-
niques developed by F. F. Lee (in his 1965 doctoral thesis). A morph lexicon suf-
ficient for these demonstrations was developed. It was anticipated that any excep-
tional words not analyzed into morphs would be pronounced by using spelled
speech. As a result, these words were heard as a sequence of individually
pronounced letters. The dictionary provided names of the phonetic segments for
each morph, and synthesis was performed using the algorithms developed and
published by Holrhes, Mattingly, and Shearme. An analog synthesizer was used to

amplifiers. The demonstration of this system was impressive, although the

1