from-text-to-speech-the-mit.../pages-txt/180.txt

From text to speech: The MITalk system

mance of the present system with other text-to-speech systems. Moreover, the
present results have provided a basis for evaluating the overall design of the sys-
tem and the functioning of several of the individual components. Since a relatively
large amount of text was specifically generated for this project, we were able to
identify a number of errors in the operation of the system which ordinarily might
not have been detected. In this last section of the chapter, we summarize briefly a
few of the errors we were able to uncover during and after the evaluation. We will
also point out some of the limitations of the current evaluation results and then dis-
cuss several directions for additional testing in the future.

After the test materials for the evaluation project were generated, it was pos-
sible to go back and examine the output of each module individually in order to
determine whether it provided a correct analysis of the input text. Errors of
various kinds in the final spoken output could originate at several different
modaules in the system. In addition, there could be errors resulting from transcrip-
tion that we would not associate with the operation of the text-to-speech system
itself.

Of all the errors observed, we discovered only one that could legitimately be
classified as a transcription error. In this case, the word “harmonies” was incor-
rectly typed into the system as “harmonics” and was not detected in subsequent
proofreading. All remaining errors could be located at one or more modules of the
system. These errors consisted of incorrect parsings, pronunciations, or stress as-
signments. An error located at one module often affected analyses carried out by
other modules. Sometimes the results of these errors were quite noticeable in the
spoken output, particularly when the errors produced segmental distinctions that
could be detected in pronunciation. However, in other cases, particularly where
stress assignment was involved, the differences were more difficult to detect.

At the time this report was coﬁ{pleted, we were able to locate only two errors
in the operation of the first module of the system. This module (FORMAT) has a
dictionary that converts abbreviations, symbols, and numbers to words for sub-
sequent processing. One error involved the abbreviation “U.S.” in which a space
was incorrectly typed between “U.” and “S.” The rule which was applied here
places an end-of-sentence period in the output if an abbreviatory period (as in
“U.”) is followed by one or more spaces and a capital letter (the “S”). Thus, two
sentences were formed, one ending in “U.” and the other beginning with “S.” This
error causes an incorrect pitch contour to be placed on the output, as well as in-
appropriate segmental durations to be assigned in later modules.

Another error involved the abbreviation “19th”. In all cases, alphanumerics

168