from-text-to-speech-the-mit.../pages-txt/164.txt

From text to speech: The MITalk system

In planning the current evaluation project, we also wanted to obtain infor-
mation about several different aspects of the total system and their contribution to
intelligibility and comprehension of speech. To meet this goal, a number of dif-
ferent tests were selected to provide information about: 1) phoneme recognition, 2)
word recognition in sentences, and 3) listening comprehension. It was assumed
that the results of these three tests together would provide qualitative and quantita-
tive information sufficient to identify any major problems in the operation of the
total system at the time of testing in early May of 1979. The results of these three
types of tests would also provide much more detailed information about the rela-
tive contribution of several of the individual components of the system and their
potential interaction.

In carrying out these evaluation tests, we collected a total of 27,128 responses
from some 160 naive listeners. A total of 45 minutes of synthetic speech was
generated in fully automatic text-to-speech mode. No system errors were cor-
rected at this time and no total system crashes were encountered during the genera-
tion of the test materials used in the evaluation.

13.2 Phoneme recognition

After initial discussions, we decided to use the Modified Rhyme Test to measure
the intelligibility of the speech produced by the system. This test was originally
developed by Fairbanks (1958) and then later modified by House et al. (1965).
This test was chosen primarily because it is reliable, shows little effect of learning,
and is easy to administer to untrained and relatively naive listeners. It also uses
standard orthographic responses, thereby eliminating problems associated with
phonetic notation. Moreover, extensive data have already been collected with
natural speech, as well as synthetic speech produced by the Haskins speech syn-
thesizer (Nye and Gaitenby, 1973), therefore permitting us to make several direct
comparisons of the acoustic-phonetic output of the two text-to-speech systems un-

der somewhat comparable testing conditions.
13.2.1 Method

13.2.1.1 Subjects Seventy-two naive undergraduate students at Indiana Univer-
sity in Bloomington served as paid listeners in this study. They were all recruited
by means of an advertisement in the student newspaper and reported no history of
a hearing or speech disorder at the time of testing. The subjects were all right-

handed native speakers of English.

13.2.1.2 Stimuli  Six lists of 50 monosyllabic words were prepared on the MIT
text-to-speech system. The lists were recorded on audio tape via a Revox Model

152