|
|
The phonetic component
|
|
|
|
|
|
the parameter value assigned to the time of the segment boundary. These con-
|
|
|
stants are determined by rules that involve features of the current phonetic segment
|
|
|
PHOCUR, the previous phonetic segment PHOLAS, and the next phonetic seg-
|
|
|
ment PHONEX. In some cases, the rules have to examine features of segments
|
|
|
further from the current segment, but this is rare. For example, in pin, the time of
|
|
|
voicing onset in the vowel preceded by the voiceless plosive pp is delayed by
|
|
|
about 50 msec, unless the segment preceding the voiceless plosive is an ss, as in
|
|
|
spin. The variable control parameters are listed later in Table 11-3.
|
|
|
|
|
|
11.1.3 History of formant synthesis-by-rule
|
|
|
|
|
|
As originally demonstrated by John Holmes, successful imitation of a natural ut-
|
|
|
terance depends primarily on matching observed short-term spectra. This tech-
|
|
|
nique succeeds, in part, because it reproduces all of the potential cues present in
|
|
|
the spectrum, even though we may not know which cues are most important. The
|
|
|
speech perception apparatus appears to be aware of any and all (perceptually
|
|
|
discriminable) regularities present in the acoustic signal generated by the speech
|
|
|
production apparatus, and these regularities should be included in synthetic stimuli
|
|
|
if possible.
|
|
|
|
|
|
There have been a number of previous efforts to specify general strategies for
|
|
|
formant synthesis-by-rule (see, e.g., Holmes et al, 1964; Mattingly, 1968a;
|
|
|
Rabiner, 1968a; Coker et al., 1973; Klatt, 1972, 1976a). However, examination of
|
|
|
these publications suggests that consonant-vowel intelligibility is nowhere near as
|
|
|
high as in listening to natural speech. For example, Rabiner (1968a) estimated that
|
|
|
consonants in his synthetic consonant-vowel nonsense stimuli were 85 percent in-
|
|
|
telligible to phonetically trained listeners, but that natural tokens of the same syll-
|
|
|
ables were about 99 percent intelligible. Other rule programs, apparently, perform
|
|
|
no better, although relevant evaluative data are generally not available.
|
|
|
|
|
|
Why isn’t intelligibility higher? Each rule system attempts to make ap-
|
|
|
propriate generalizations and simplifications concerning the form and content of
|
|
|
rules for consonant-vowel synthesis. Have the wrong generalizations been made?
|
|
|
The results described below in Section 11.2 suggest that this conjecture is true.
|
|
|
|
|
|
11.2 “Synthesis-by-analysis” of consonant-vowel syllables
|
|
|
|
|
|
11.2.1 Analysis of CV syllables
|
|
|
|
|
|
The data base that was recorded and analyzed in order to develop new consonant-
|
|
|
vowel synthesis rules consists of speech samples obtained from six talkers who
|
|
|
were native to a single midwestern dialect region -- three males and three females
|
|
|
(Klatt, 1979b). The intent was to use the data from all six talkers to establish the
|
|
|
form of the synthesis rules, but the actual parameter values inserted in the rules
|
|
|
|
|
|
109
|