|
|
Survey of speech synthesis technology
|
|
|
|
|
|
The input to the rules includes phonemes, stress, word and morpheme boundaries,
|
|
|
and syntactic structure.
|
|
|
|
|
|
In time, these methods ought to be able to produce highly intelligible natural
|
|
|
speech, but present results are frequently perceived to be somewhat unnatural and
|
|
|
machine-like. This appears to be due mainly to the intricate complexity of the
|
|
|
speech code and the fact that not all of the rules are known at this time. There is a
|
|
|
particular need to improve on the specification of fundamental frequency and dura-
|
|
|
tion algorithms, perhaps by making incremental improvements to current al-
|
|
|
gorithms (Umeda, 1976; Klatt, 1979a; Maeda, 1974; O’Shaughnessy, 1977; Pierre-
|
|
|
humbert, 1979).
|
|
|
|
|
|
7.4 Applications
|
|
|
|
|
|
7.4.1 Synthesis of arbitrary English sentences
|
|
|
|
|
|
From the above discussion, it should be clear that there are a number of promising
|
|
|
methods for synthesizing general English. To generate a particular utterance, one
|
|
|
must know 1) the phonemic (or phonetic) representation for each word, 2) the
|
|
|
stress pattern for each word, 3) aspects of the syntactic structure of the sentence,
|
|
|
and 4) the locations of any words that are to receive semantic focus. This infor-
|
|
|
mation would have to be stored in the computer for each utterance to be syn-
|
|
|
thesized, or it might be generated from a deep-structure representation of the con-
|
|
|
cept to be expressed (Woods et al., 1976; Young and Fallside, 1979).
|
|
|
|
|
|
7.4.2 Synthesis of arbitrary English names
|
|
|
|
|
|
Research at Bell Laboratories (Denes, 1979; Liberman, 1979; Olive, 1979) is
|
|
|
directed at the ability to synthesize any name from a telephone directory for ap-
|
|
|
plication in automated directory assistance. The linguistic problems associated
|
|
|
with converting spelling to a phonetic representation and stress pattern are severe
|
|
|
since it is sometimes necessary to guess the native language of the individual be-
|
|
|
fore a good rendering of the pronunciation is possible (Liberman, 1979). Once a
|
|
|
phonetic representation has been derived, this experimental system uses diphone
|
|
|
synthesis (Olive, 1979) to generate a waveform. |
|
|
|
|
|
|
7.4.3 Text-to-speech conversion
|
|
|
|
|
|
The transformation of English text to speech is a much more formidable problem
|
|
|
than the synthesis of an arbitrary sentence from a knowledge of its underlying lin-
|
|
|
guistic representation. The text does not indicate everything that one would like to
|
|
|
know (unless one builds a machine that can recognize the meaning of the text, and
|
|
|
thereby disambiguate (frequently occurring) syntactic ambiguities, and determine
|
|
|
semantic focus relations).
|
|
|
|
|
|
79
|