You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
49 lines
2.6 KiB
49 lines
2.6 KiB
Survey of speech synthesis technology
|
|
|
|
variety of applications where the vocabulary consists of a small number of words
|
|
and where the messages are simple and follow a rather rigid format. However,
|
|
there are a number of limitations of such systems which make them unsatisfactory
|
|
for more general applications, such as automatic conversion of English text to
|
|
speech.
|
|
|
|
Figure WORD-BLEND illustrates some of the differences between words
|
|
spoken in isolation and the same words put together in a fluently spoken sentence.
|
|
Not only are most words considerably shorter, but there are acoustic changes at the
|
|
boundaries between words due to coarticulation, and due to phonological rules that
|
|
change the pronunciation of words in certain sentence contexts. Furthermore, the
|
|
intonation, rhythm, and stress pattern appropriate to the sentence cannot be syn-
|
|
thesized if one simply concatenates prerecorded words. These prosodic qualities
|
|
turn out to be extremely important. Words that are perfectly intelligible in isola-
|
|
tion seem to come too fast and in a disconnected manner when the words are con-
|
|
catenated in such a way that the prosody is wrong.
|
|
|
|
Thus simple word concatenation schemes have severe limitations as audio
|
|
response units. In contrast, there are several newer techniques under development
|
|
that do not have these limitations. These techniques range from complex systems
|
|
for speech synthesis-by-rule (where a synthetic waveform is computed from a
|
|
knowledge of linguistic and acoustic rules), to relatively simple systems for creat-
|
|
ing speech utterances by concatenating prerecorded speech waveform chunks
|
|
smaller than a word (using vocoder analysis-synthesis technology to gain
|
|
flexibility in reassembly).
|
|
|
|
Speech synthesis techniques have been reviewed in Flanagan and Rabiner
|
|
(1973), Klatt (1974), and Rabiner and Schafer (1976). We describe here some of
|
|
the current techniques that have been employed. Of particular interest are criteria
|
|
by which one selects an inventory of basic speech units to be used in utterance as-
|
|
sembly, how one selects a method of unit concatenation, and how to specify
|
|
sentence-level prosodic variables.
|
|
|
|
7.3 Synthesis techniques
|
|
|
|
The techniques to be covered in this section include systems for forming messages
|
|
out of words as the basic units, out of syllables and diphones as the basic units, and
|
|
out of phonemes as the basic units.
|
|
|
|
7.3.1 Word assembly
|
|
|
|
7.3.1.1 Prerecorded words and phrases Early methods of spoken message as-
|
|
sembly used prerecorded words (or whole phrases) that were concatenated into
|
|
sentences (Homsby, 1972; Chapman, 1971; Buron, 1968). Brief pauses were in-
|
|
|
|
73
|