You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
45 lines
2.9 KiB
45 lines
2.9 KiB
From text to speech: The MITalk system
|
|
|
|
implemented which corrects a number of the parsing errors in which the sentential
|
|
verb was included in the preceding noun phrase. However, some of the other pars-
|
|
ing errors are not as easy to correct. Errors made by the first module and the
|
|
spelling-to-sound rules are highly context-dependent, and are not easily amenable
|
|
to simple change by rule. From our examination of the errors uncovered so far, all
|
|
cases could be accounted for and located in some module of the system. There
|
|
were no errors detected which escaped explanation at the present time, although
|
|
further study is continuing.
|
|
|
|
The results of the present evaluation study have several limitations and these
|
|
should be summarized here briefly for future reference. First, we did not carry out
|
|
any of the control conditions for the three types of tests. using natural speech. To
|
|
some extent, this might be considered an important addition and extension of the
|
|
current evaluation since it is the level of performance with natural speech that is
|
|
frequently used as the yardstick against which to compare the quality of synthetic
|
|
speech. There can be little doubt that tests with natural speech would show higher
|
|
levels of performance when compared with synthetic speech. But it should be em-
|
|
phasized here that the levels of performance in the current study are already quite
|
|
high to begin with, therefore it is not immediately obvious what would be gained
|
|
from such additional tests with natural speech.
|
|
|
|
Secondly, with regard to measuring intelligibility of the segmental output, it is
|
|
clear that the Modified Rhyme Test is much too easy for listeners, even naive lis-
|
|
teners, and additional tests using an open-response set should be employed. Ad-
|
|
ditional testing under varying noise conditions may also provide further infor-
|
|
mation concerning the quality of the synthesis and its resistance to noise and dis-
|
|
tortion. In this regard, the analysis of the Haskins anomalous sentences should
|
|
also provide a rich source of data on phonetic confusions using an open-response
|
|
set. We are planning additional detailed analyses of these data.
|
|
|
|
Finally, the comprehension test used was relatively gross in its ability to dis-
|
|
tinguish between new knowledge acquired from listening to text and knowledge
|
|
obtained from inferences drawn at the time of comprehension or, later, at the time
|
|
of testing. Of course, this is a problem related more to several broader issues in
|
|
language comprehension and understanding than to questions surrounding text-to-
|
|
speech and speech synthesis-by-rule. Nevertheless, it may be possible to learn a
|
|
great deal more about language comprehension and the interaction between top-
|
|
down and bottom-up knowledge sources in speech perception by the advances that
|
|
have been made in conceptualizing various linguistic problems within the context
|
|
of a functional text-to-speech system. The success of the current system and its
|
|
|
|
170
|