from-text-to-speech-the-mit.../pages-txt/182.txt

From text to speech: The MITalk system

implemented which corrects a number of the parsing errors in which the sentential
verb was included in the preceding noun phrase. However, some of the other pars-
ing errors are not as easy to correct. Errors made by the first module and the
spelling-to-sound rules are highly context-dependent, and are not easily amenable
to simple change by rule. From our examination of the errors uncovered so far, all
cases could be accounted for and located in some module of the system. There
were no errors detected which escaped explanation at the present time, although
further study is continuing.

The results of the present evaluation study have several limitations and these
should be summarized here briefly for future reference. First, we did not carry out
any of the control conditions for the three types of tests. using natural speech. To
some extent, this might be considered an important addition and extension of the
current evaluation since it is the level of performance with natural speech that is
frequently used as the yardstick against which to compare the quality of synthetic
speech. There can be little doubt that tests with natural speech would show higher
levels of performance when compared with synthetic speech. But it should be em-
phasized here that the levels of performance in the current study are already quite
high to begin with, therefore it is not immediately obvious what would be gained
from such additional tests with natural speech.

Secondly, with regard to measuring intelligibility of the segmental output, it is
clear that the Modified Rhyme Test is much too easy for listeners, even naive lis-
teners, and additional tests using an open-response set should be employed. Ad-
ditional testing under varying noise conditions may also provide further infor-
mation concerning the quality of the synthesis and its resistance to noise and dis-
tortion. In this regard, the analysis of the Haskins anomalous sentences should
also provide a rich source of data on phonetic confusions using an open-response
set. We are planning additional detailed analyses of these data.

Finally, the comprehension test used was relatively gross in its ability to dis-
tinguish between new knowledge acquired from listening to text and knowledge
obtained from inferences drawn at the time of comprehension or, later, at the time
of testing. Of course, this is a problem related more to several broader issues in
language comprehension and understanding than to questions surrounding text-to-
speech and speech synthesis-by-rule. Nevertheless, it may be possible to learn a
great deal more about language comprehension and the interaction between top-
down and bottom-up knowledge sources in speech perception by the advances that
have been made in conceptualizing various linguistic problems within the context
of a functional text-to-speech system. The success of the current system and its

170