You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
46 lines
2.9 KiB
46 lines
2.9 KiB
From text to speech: The MITalk system
|
|
|
|
mance of the present system with other text-to-speech systems. Moreover, the
|
|
present results have provided a basis for evaluating the overall design of the sys-
|
|
tem and the functioning of several of the individual components. Since a relatively
|
|
large amount of text was specifically generated for this project, we were able to
|
|
identify a number of errors in the operation of the system which ordinarily might
|
|
not have been detected. In this last section of the chapter, we summarize briefly a
|
|
few of the errors we were able to uncover during and after the evaluation. We will
|
|
also point out some of the limitations of the current evaluation results and then dis-
|
|
cuss several directions for additional testing in the future.
|
|
|
|
After the test materials for the evaluation project were generated, it was pos-
|
|
sible to go back and examine the output of each module individually in order to
|
|
determine whether it provided a correct analysis of the input text. Errors of
|
|
various kinds in the final spoken output could originate at several different
|
|
modaules in the system. In addition, there could be errors resulting from transcrip-
|
|
tion that we would not associate with the operation of the text-to-speech system
|
|
itself.
|
|
|
|
Of all the errors observed, we discovered only one that could legitimately be
|
|
classified as a transcription error. In this case, the word “harmonies” was incor-
|
|
rectly typed into the system as “harmonics” and was not detected in subsequent
|
|
proofreading. All remaining errors could be located at one or more modules of the
|
|
system. These errors consisted of incorrect parsings, pronunciations, or stress as-
|
|
signments. An error located at one module often affected analyses carried out by
|
|
other modules. Sometimes the results of these errors were quite noticeable in the
|
|
spoken output, particularly when the errors produced segmental distinctions that
|
|
could be detected in pronunciation. However, in other cases, particularly where
|
|
stress assignment was involved, the differences were more difficult to detect.
|
|
|
|
At the time this report was cofi{pleted, we were able to locate only two errors
|
|
in the operation of the first module of the system. This module (FORMAT) has a
|
|
dictionary that converts abbreviations, symbols, and numbers to words for sub-
|
|
sequent processing. One error involved the abbreviation “U.S.” in which a space
|
|
was incorrectly typed between “U.” and “S.” The rule which was applied here
|
|
places an end-of-sentence period in the output if an abbreviatory period (as in
|
|
“U.”) is followed by one or more spaces and a capital letter (the “S”). Thus, two
|
|
sentences were formed, one ending in “U.” and the other beginning with “S.” This
|
|
error causes an incorrect pitch contour to be placed on the output, as well as in-
|
|
appropriate segmental durations to be assigned in later modules.
|
|
|
|
Another error involved the abbreviation “19th”. In all cases, alphanumerics
|
|
|
|
168
|