You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
45 lines
2.9 KiB
45 lines
2.9 KiB
Introduction
|
|
|
|
semantic information if available. In this way, synthesis-by-rule techniques can
|
|
utilize a very low bit-rate message description (<100 bits/sec) as input, but sub-
|
|
stantial computation must be used to compute the model parameters and then
|
|
produce the speech waveform. Clearly there is complete freedom to specify the
|
|
model parameters, but of course also the need to control these parameters cor-
|
|
rectly. Since the rules are still imperfect, the resulting speech quality is not as
|
|
good as recorded human speech, but recent tests have shown that high intel-
|
|
ligibility and comprehensibility can be obtained, and when sentence and
|
|
paragraph-level messages must be synthesized, the rule system provides the neces-
|
|
sary degrees of freedom to produce smooth-flowing good quality speech. It is in-
|
|
teresting to consider that synthesis-by-rule systems delay the binding of the speech
|
|
parameter set and waveform to the input message by using very deep language
|
|
abstractions, and hence provide a maximum of flexibility, and are thus well suited
|
|
to the needs of converting unrestricted text to speech. The designers of these sys-
|
|
tems must, however, discover the relationship between the underlying linguistic
|
|
specification of the message and the resulting speech signal, a topic which has
|
|
been central to speech science and linguistics for several decades. Thus synthesis-
|
|
by-rule both benefits from and contributes to our general knowledge of speech and
|
|
linguistics, and the steady improvement in speech synthesis-by-rule quality reflects
|
|
this joint progress. While it is believed that current synthetic speech quality is ac-
|
|
ceptable for many applications, it can certainly be expected to continue to improve
|
|
with our increasing knowledge.
|
|
|
|
1.2.4 Text-to-speech conversion |
|
|
|
|
The synthesis-by-rule techniques described above require a detailed phonetic
|
|
transcription as input. While this input requires very little memory for message
|
|
storage, a frequent requirement is to convert text to speech. When it is desired to
|
|
convert unrestricted English text to speech, the flexibility of synthesis-by-rule is
|
|
needed, so that means must be afforded to convert the input text to the phonetic
|
|
transcription needed by the synthesis-by-rule techniques. Itis clear, then, that first
|
|
the text must be analyzed to obtain the phonetic transcription, which is then sub-
|
|
|
|
jected to a synthesis procedure to yield the output speech waveform. The analysis
|
|
of the text is heavily linguistic in nature, involving a determination of the under-
|
|
lying phonemic, syllabic, morphemic and syntactic form of the message, plus
|
|
whatever semantic and pragmatic information can be gleaned. Text-to-speech con-
|
|
version can thus be seen as a collection of techniques requiring the successful in-
|
|
tegration of the task constraints with other constraints provided by the nature of the
|
|
human vocal apparatus, the linguistic structure of the language, and the implemen-
|
|
|
|
11
|