You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
43 lines
2.7 KiB
43 lines
2.7 KiB
From text to speech: The MITalk system
|
|
|
|
differs slightly from the results found for the synthetic speech in the earlier Has-
|
|
kins evaluation. In the Haskins study, error rates for the synthetic speech in initial
|
|
and final positions were about the same with a very slight advantage for con-
|
|
sonants in final position. The comparable overall error rates obtained for natural
|
|
speech in the Modified Rhyme Test by House et al. and Nye and Gaitenby (1973)
|
|
were 4 percent and 2.7 percent, respectively.
|
|
|
|
In the earlier evaluation study, Nye and Gaitenby (1974) checked to ensure
|
|
that the phonemic input to the Haskins synthesizer was correct. However, no cor-
|
|
rections of any kind were made by hand in generating the present materials, either
|
|
from entries in the morph lexicon or from spelling-to-sound rules. As discussed in
|
|
the final section of this chapter, several different kinds of errors were uncovered in
|
|
different modules as a result of generating such a large amount of synthetic speech
|
|
through the system.
|
|
|
|
Except for the high error rates observed for the nasals and fricatives in final
|
|
syllable position, the synthesis of segmental information in the text-to-speech sys-
|
|
tem appears to be excellent, at least as measured in a forced-choice format among
|
|
minimal pairs of test items. With phoneme recognition performance as high as it
|
|
is--nearly close to ceiling levels--it is difficult to pick up subtle details of the error
|
|
patterns that might be useful in improving the quality of the output of the phonetic
|
|
component of the system at the present time. In addition, the errors that were ob-
|
|
served in the present tests might well be reduced substantially if the listeners had
|
|
more experience with the speech output produced by the system. It is well known
|
|
among investigators working with synthetic speech that rather substantial improve-
|
|
ments in intelligibility can be observed when listeners become familiar with the
|
|
quality of the synthesizer. Nye and Gaitenby (1974) as well as Carlson et al.
|
|
(1976) have reported very sizeable learning effects in listening to synthetic speech.
|
|
In the latter study, performance increased from 55 percent to 90 percent correct
|
|
after the presentation of only 200 synthetic sentences over a two-week period. (See
|
|
also the discussion of the word recognition and comprehension results below.)
|
|
|
|
In summary, the results of the Modified Rhyme Test revealed very high levels
|
|
of intelligibility of the speech output from the system using naive listeners as sub-
|
|
jects. While the overall level of performance is somewhat lower than in previous
|
|
studies employing natural speech, the level of performance for recognition of seg-
|
|
mental information appears to be quite satisfactory for a wide range of text-to-
|
|
speech applications at the present time.
|
|
|
|
156
|