from-text-to-speech-the-mit.../pages-txt/174.txt

From text to speech: The MITalk system

prehension is the quality of the input signal expressed in terms of its overall intel-
ligibility. But as we have seen even from the results summarized in the previous
sections, additional consideration must also be given to the contribution of higher-
level sources of knowledge to recognition and comprehension. In this last section,
we wanted to obtain some preliminary estimate of how well listeners could com-
prehend continuous text produced by the text-to-speech system. Previous evalua-
tions of synthetic speech output have been concerned primarily with measuring in-
telligibility or listener preferences with little if any concern for assessing com-
prehension or understanding of the content of the materials (Nye et al., 1975).
Indeed, as far as we have been able to determine, no previous formal tests of the
comprehension of continuous synthetic speech have ever been carried out with a
relatively wide range of textual materials specifically designed to assess under-
standing of the content rather than form of the speech.

To accomplish this goal, we selected fifteen narrative passages and an ap-
propriate set of test questions from several standardized adult reading comprehen-
sion tests. The passages were quite diverse, covering a wide range of topics, writ-
ing styles and vocabulary. We thought that a large number of passages would be
interesting to listen to in the context of tests designed to assess comprehension and
understanding. Since these test passages were selected from several different types
of reading tests, they also varied in difficulty and style, permitting us to evaluate
the contribution of all of the individual modules of the text-to-speech system in
terms of one relatively gross measure.

In addition to securing measures of listening comprehension for these pas-
sages, we also collected a parallel set of data on reading comprehension of these
materials from a second group of subjects. The subjects in the reading comprehen-
sion group answered the same questions after reading each passage silently, as did
subjects in the listening comprehensiﬂon group. This condition was included in or-
der to permit comparison between the two input modalities. It was assumed that
the results of these comprehension tests would therefore provide an initial, al-
though preliminary, benchmark against which the entire text-to-speech system
could be evaluated with materials somewhat comparable to those used in the im-
mediate future.

13.4.1 Method

13.4.1.1 Subjects  Forty-four additional naive undergraduate students were
recruited as paid subjects. They were drawn from the same source as the subjects
used in the previous studies. Some of the subjects assigned to the reading com-
prehension group had participated in the earlier study using the Modified Rhyme

162