You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
52 lines
2.5 KiB
52 lines
2.5 KiB
From text to speech: The MITalk system
|
|
|
|
differences has led to clear improvement in intelligibility. At least one more itera-
|
|
tion of this procedure is needed. Furthermore, within the constraints imposed by
|
|
the synthesizer itself, matching of linear-prediction spectra is adequate to the task.
|
|
|
|
11.3 General rules for the synthesis of phonetic sequences
|
|
|
|
The rule program used in MITalk differs from the limited CV synthesis algorithm
|
|
described above. The MITalk phonetic component PHONET is patterned after a
|
|
Fortran-based synthesis-by-rule program described by Klatt (1976a). Since that
|
|
time, both the program structure and the constants contained in target tables for
|
|
each phone have been modified. These modifications were made in order to incor-
|
|
porate some of the new consonant-vowel synthesis rules described in the previous
|
|
section, and to simplify the rule structure.
|
|
|
|
The general procedure for drawing control parameter values is:
|
|
|
|
1. Draw the target value for the first segment.
|
|
|
|
2. Draw the target value for the next segment.
|
|
|
|
3. Smooth the boundary between the segments using one of the
|
|
templates shown in Figure 11-6 (note that DISCON does no
|
|
|
|
smoothing).
|
|
|
|
4. Go to step 2 unless there are no more segments.
|
|
The transition between target values for each control parameter may either be dis-
|
|
continuous or smooth. The boundary value and transition duration in each direc-
|
|
tion from the logical phoneme boundary are computed by rules that take into ac-
|
|
count manner features of the segments involved.
|
|
|
|
11.3.1 Vowels
|
|
|
|
The control parameters that are usually varied to generate an isolated vowel are the
|
|
amplitude of voicing AV; the fundamental frequency of vocal fold vibrations FO;
|
|
the lowest three formant frequencies F1, F2, and F3; and bandwidths B1, B2, and
|
|
B3. The fourth and fifth formant frequencies might be varied to simulate spectral
|
|
details, but this is not essential for high intelligibility. To create a natural breathy
|
|
vowel termination, the amplitude of aspiration AH and the amplitude of quasi-
|
|
sinusoidal voicing AVS are activated.
|
|
|
|
Table 11-1 includes suggested target values for variable control parameters
|
|
that are used to differentiate among English vowels. Formant frequency and
|
|
bandwidth targets were obtained by trial-and-error spectral matching to a large set
|
|
of CV syllables spoken by talker DHK. Bandwidth values are often larger than
|
|
closed-glottis values obtained by Fujimura and Lindqvist (1971), because the
|
|
bandwidths of Table 11-1 have been adjusted to take into account changes to ob-
|
|
|
|
116
|