from-text-to-speech-the-mit.../pages-txt/179.txt

Some measures of intelligibility and comprehension

unrestricted text-to-speech system. Their performance is roughly comparable to
subjects who have been asked to read the same passages of text and answer the
same questions. As in the case of our other tests using synthetic speech, there ap-
pears to be an initial period during which subjects are simply becoming familiar
with the quality of the synthesizer, the prosodic rules of the system and the style of
the material. Even after only a few minutes of exposure, comprehension perfor-
mance improves substantially and eventually approximates levels observed when
subjects read the same passages of text.

It should also be pointed out that the comprehension performance observed in
these tests was obtained with a reading rate in excess of 180 words per minute.
This rate is about the rate at which people typically speak in normal conversations
or when they read text aloud. The present results therefore suggest that it is not
necessary to slow down the speaking rate or adjust the synthesis to obtain rela-
tively high levels of listening comprehension for continuous text. Until the present
tests were carried out, it was assumed by some investigators that synthetic speech
had to be output at a much slower rate to maintain intelligibility and therefore
facilitate comprehension.

Based on the results of the present comprehension test, as well as the other
tests of intelligibility and word recognition that were carried out, there is good
reason to believe that the basic design of the MIT text-to-speech system is valid.
The system can not only produce highly intelligible synthetic speech, as shown in
our earlier tests, but the quality of the synthetic speech can be understood and com-
prehended at reasonably high levels. While there are, no doubt, many subtle
details of the system that might be improved, the results of these preliminary tests
support the general conclusion that very high-quality synthetic speech can be
produced automatically from unrestricted text and that such a system could be im-
plemented in applied settings in the immediate future. After some thirty years of
research, the widespread use of text-to-speech and voice response systems in com-
puter aided instruction and as aids for the handicapped is now a realistic goal. The
obstacles are no longer questions of research into the basic principles of speech
production, perception, and linguistic analysis, but are simply the practical matters
of implementation and economics.

13.5 General discussion and conclusions

The results of the three tests designed to evaluate intelligibility, word recognition,
and listening comprehension indicated very high levels of performance for the cur-
rent version of the text-to-speech system. While these tests are only preliminary,
they have provided an initial benchmark against which to compare the perfor-

167