from-text-to-speech-the-mit.../pages-txt/169.txt

Some measures of intelligibility and comprehension

13.3 Word recognition in sentences

The results of the Modified Rhyme Test using isolated words indicated very high
levels of intelligibility for the segmental output of the text-to-speech system.
However, the Modified Rhyme Test employs a closed-response set involving a
forced-choice format in what may be considered a relatively low uncertainty test-
ing situation. In the recognition and comprehension of unrestricted text, a substan-
tially broader range of alternatives is available to the listener since the response set
is open and potentially infinite in size. Moreover, the sentential context itself
provides an important contribution to intelligibility of speech, a fact that has been
known for many years (Miller et al., 1951; Miller and Isard, 1963).

To evaluate word recognition in sentence context, we decided to obtain two
quite different sets of data. One set was collected using a small number of the Har-
vard Psychoacoustic Sentences (Egan, 1948). These test sentences are all mean-
ingful and contain a wide range of different syntactic constructions. In addition,
the various segmental phonemes of English are represented in these sentences in
accordance with their frequency of occurrence in the language. Thus, the results
obtained with the Harvard sentences should provide a fairly good estimate of how
well we might expect word recognition to proceed in sentences when both seman-
tic and syntactic information is available to a listener. This situation could be con-
sidered comparable, in some sense, to normal listening conditions where “top-
down” knowledge interacts with sensory input in the recognition and comprehen-
sion of speech (see Pisoni, 1978; Marslen-Wilson and Welsh, 1978).

We also collected word recognition data with a set of syntactically normal but
semantically anomalous sentences that were developed at Haskins Laboratories by
Nye and Gaitenby (1974) for use in evaluating the intelligibility of their text-to-
speech system (see also Ingeman, 1978). These test sentences permit a somewhat
finer assessment of the availability and quality of “bottom-up” acoustic-phonetic
information and its potential contribution to word recognition. Since the materials
are all meaningless sentences, the individual words cannot be identified or
predicted from knowledge of the sentential context or semantic interpretation.
Thus, the results of these tests using the Haskins anomalous sentences should
provide an estimate of the upper bound on the contribution of strictly phonetic in-
formation to word recognition in sentence contexts. Since the response set is also
open and essentially unrestricted, we would anticipate substantially lower levels of
word recognition performance on this test than on the Harvard test; in the latter
test, syntactic and semantic context is readily available and can be used freely by
the listener at all levels of processing the speech input. In addition, the results of

157