You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

47 lines
2.8 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

Survey of speech synthesis technology
serted between words, and a reasonable sentence intonation contour was realized
by restricting a given prerecorded element to only certain utterance positions. A
great deal of care was taken in speaking, recording, and editing the basic
vocabulary items.
Word storage has involved various analog and digital techniques that range
from recording each word into a half-second slot on a rotating drum, to sophis-
ticated digital techniques for reducing the number of bits that must be stored.
Digital methods for representing speech waveforms are reviewed by Rabiner and
Schafer (1976) and by Jayant (1974). One remarkable technique developed at
Texas Instruments (Wiggins, 1979) involves storing a 1000 bit-per-second
linear-prediction representation for each word on integrated circuit chips having a
capacity of 200 seconds of speech, and using an IC linear-prediction synthesizer to
play selected words (all of this circuitry being offered at $50 in the Speak-N-Spell
childrens toy).
7.3.1.2 Formant vocoding of words Rabiner et al. (1971a) suggested that one
could get rid of the choppiness of waveform concatenation by extracting formant
trajectories for each prerecorded word and smoothing formant parameter tracks
across word boundaries before formant vocoder resynthesis. A second advantage
of formant analysis-synthesis of the words that make up a synthetic utterance is
that the duration pattern and fundamental frequency contour can be adjusted to
match the accent pattern, thythm, and intonation requirements of the sentence to be
produced. The technique has been used successfully in telephone number syn-
thesis where a known prosodic contour could be superimposed (for example, a
pause and a “continuation rise” intonation can be placed just before the fourth digit
of a seven digit telephone number). However, the authors did not offer general
prosodic rules for sentence synthesis.
7.3.1.3 Linear-prediction coded words Olive (1974) later showed that a similar
system could be based on linear prediction encoding. Furthermore, it was deter-
mined that a correct fundamental frequency contour for a sentence was percep-
tually more important than the exact duplication of the durational pattern or careful
smoothing of the formant transitions between words.
The advantage of the prerecorded word as a unit is ease of bringing up a
limited audio response unit. The disadvantages are that: 1) large vocabularies are
impractical, and 2) general timing and fundamental frequency rules that adjust the
prosodic characteristics of a word as a function of sentence structure are more
easily defined at a segmental level. For example, only the final vowel and
postvocalic consonants of a word are lengthened at phrase and clause boundaries
(Klatt, 1976b).
75