from-text-to-speech-the-mit.../pages-txt/163.txt

13

Some measures of intelligibility and comprehensionl

13.1 Overview

As the ten year effort to build an unrestricted text-to-speech system at MIT drew to
a close, it seemed appropriate to conduct a preliminary evaluation of the quality of
the speech output with a relatively large group of naive listeners. The results of
such an evaluation would no doubt prove useful in first establishing a benchmark
level of performance for comparative purposes, as well as uncovering any
problems in the current version of the system that might not have been detected
earlier. In addition to obtaining measures of intelligibility of the speech output
produced by the text-to-speech system, we were also interested in finding out how
well naive listeners could comprehend continuous text produced by the system.
This was thought to be an important aspect of the evaluation of the text-to-speech
system as a whole, since a version of the current system might eventually be im-
plemented as a device used for computer-aided instruction or as a functional
reading machine for the blind (Allen, 1973). Both of these applications are now
well within the realm of the available technology (Allen et al., 1979).

In carrying out the evaluation of the system, we patterned several aspects of
the testing after earlier work already completed on the evaluation of the Haskins
Laboratories reading machine project so that some initial comparisons could be
drawn between the two systems (Nye and Gaitenby, 1973, 1974). However, we
also added several other tests to the evaluation to gain additional information about
word recognition in normal sentential contexts and listening comprehension for a
relatively wide range of narrative passages of continuous text. Data were also col-
lected on reading comprehension for the same set of materials to permit direct
comparison between the two input modalities. Traditional measures of listening or
reading comprehension have not typically been obtained in previous evaluations of
the quality of synthetic speech output, and therefore, we felt that some preliminary
data would be quite useful before the major components of the present system
were implemented as a workable text-to-speech device in an applied context.

IThis chapter was written by D. Pisoni in 1978-9.

151