You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
48 lines
2.6 KiB
48 lines
2.6 KiB
From text to speech: The MITalk system
|
|
|
|
In planning the current evaluation project, we also wanted to obtain infor-
|
|
mation about several different aspects of the total system and their contribution to
|
|
intelligibility and comprehension of speech. To meet this goal, a number of dif-
|
|
ferent tests were selected to provide information about: 1) phoneme recognition, 2)
|
|
word recognition in sentences, and 3) listening comprehension. It was assumed
|
|
that the results of these three tests together would provide qualitative and quantita-
|
|
tive information sufficient to identify any major problems in the operation of the
|
|
total system at the time of testing in early May of 1979. The results of these three
|
|
types of tests would also provide much more detailed information about the rela-
|
|
tive contribution of several of the individual components of the system and their
|
|
potential interaction.
|
|
|
|
In carrying out these evaluation tests, we collected a total of 27,128 responses
|
|
from some 160 naive listeners. A total of 45 minutes of synthetic speech was
|
|
generated in fully automatic text-to-speech mode. No system errors were cor-
|
|
rected at this time and no total system crashes were encountered during the genera-
|
|
tion of the test materials used in the evaluation.
|
|
|
|
13.2 Phoneme recognition
|
|
|
|
After initial discussions, we decided to use the Modified Rhyme Test to measure
|
|
the intelligibility of the speech produced by the system. This test was originally
|
|
developed by Fairbanks (1958) and then later modified by House et al. (1965).
|
|
This test was chosen primarily because it is reliable, shows little effect of learning,
|
|
and is easy to administer to untrained and relatively naive listeners. It also uses
|
|
standard orthographic responses, thereby eliminating problems associated with
|
|
phonetic notation. Moreover, extensive data have already been collected with
|
|
natural speech, as well as synthetic speech produced by the Haskins speech syn-
|
|
thesizer (Nye and Gaitenby, 1973), therefore permitting us to make several direct
|
|
comparisons of the acoustic-phonetic output of the two text-to-speech systems un-
|
|
|
|
der somewhat comparable testing conditions.
|
|
13.2.1 Method
|
|
|
|
13.2.1.1 Subjects Seventy-two naive undergraduate students at Indiana Univer-
|
|
sity in Bloomington served as paid listeners in this study. They were all recruited
|
|
by means of an advertisement in the student newspaper and reported no history of
|
|
a hearing or speech disorder at the time of testing. The subjects were all right-
|
|
|
|
handed native speakers of English.
|
|
|
|
13.2.1.2 Stimuli Six lists of 50 monosyllabic words were prepared on the MIT
|
|
text-to-speech system. The lists were recorded on audio tape via a Revox Model
|
|
|
|
152
|