You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
47 lines
2.5 KiB
47 lines
2.5 KiB
From text to speech: The MITalk system
|
|
|
|
1.1.1 Task
|
|
|
|
The application task determines the nature of the speech capability that must be
|
|
provided. When only a small number of utterances is required, and these do not
|
|
have to be varied on line, then recorded speech can be used, but if the task is to
|
|
simulate the human cognitive process of reading aloud, then an entirely different
|
|
|
|
range of techniques is needed.
|
|
|
|
1.1.2 Human vocal apparatus
|
|
|
|
All systems must produce as output a speech waveform, but it is not an arbitrary
|
|
signal. A great deal of effort has gone into the efficient and insightful represen-
|
|
tation of the speech signal as the result of a signal source in the vocal tract exciting
|
|
the vocal tract “system function”, which acts as a filter to produce the speech
|
|
waveform. The human vocal tract also constrains the speed with which signal
|
|
changes can be made, and is also responsible for much of the coarticulatory
|
|
smoothing or encoding that makes the relation between the underlying phonetic
|
|
transcription and the speech waveform so difficult to characterize.
|
|
|
|
1.1.3 Language structure
|
|
|
|
Just as the speech waveform is not arbitrarily derived, the myriad possible speech
|
|
gestures that could be related to a linguistic message are constrained by the nature
|
|
of the particular language structure involved. It has been consistently found that
|
|
those units and structures which linguists use to describe and explain language do
|
|
in fact provide the appropriate base in terms of which the speech waveform can be
|
|
characterized and constructed. Thus, basic phonological laws, stress rules, mor-
|
|
phological and syntactic structures, and phonotactic constraints all find their use in
|
|
|
|
determining the speech output.
|
|
|
|
1.1.4 Technology
|
|
|
|
Our ability to model and construct speech output devices is strongly conditioned
|
|
by the current (and past) technology. Speech science has profited greatly from a
|
|
variety of technologies, including x-rays, motion pictures, the sonograph, modern
|
|
filter and sampled-data theory, and most importantly the modern digital computer.
|
|
While early uses of computers were for off-line speech analysis and simulation, the
|
|
advent of increasingly capable integrated circuit technology has made it possible to
|
|
build compact, low-cost, real-time devices of great capability. It is this fact, com-
|
|
bined with our substantial knowledge of the algorithms needed to generate speech,
|
|
that has propelled the field of speech output from computers into the “real world”
|
|
of practical commercial systems suitable for a wide variety of applications.
|