You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
47 lines
2.8 KiB
47 lines
2.8 KiB
From text to speech: The MITalk system
|
|
|
|
7.3.2 Syllables and diphones
|
|
|
|
Instead of using words as the basic building blocks for sentence production, a
|
|
smaller inventory of basic units is required if arbitrary English sentences are to be
|
|
synthesized. The inventory of basic speech units must satisfy several require-
|
|
ments, including: 1) the ability to construct any English word by concatenating the
|
|
units one after another, and 2) the ability to change duration, intensity and fun-
|
|
damental frequency according to the demands of the sentence syntax and stress
|
|
pattern in such a way as to produce speech that is both intelligible and natural.
|
|
|
|
7.3.2.1 Syllables The intuitive notion of the syllable as the basic unit has con-
|
|
siderable theoretical appeal. Any English word can be broken into syllables con-
|
|
sisting of a vowel nucleus and adjacent consonants. Linguists have been unable to
|
|
agree on objective criteria for assigning consonants to a particular vowel nucleus
|
|
in certain ambiguous cases such as “butter”, but an arbitrary decision can be made
|
|
for synthesis purposes.
|
|
|
|
The greatest theoretical advantage of the syllable concemns the way that
|
|
acoustic characteristics of most consonant-vowel transitions are preserved.
|
|
Context-conditioned acoustic changes to consonants are automatically present to a
|
|
great extent when the syllable is chosen as the basic unit, but not when smaller
|
|
units such as the phoneme are concatenated.
|
|
|
|
The disadvantages of the syllable are: 1) coarticulation across syllable boun-
|
|
daries is not treated, and this coarticulation can be just as important as within-
|
|
syllable coarticulation, 2) if prerecorded syllables are stored in the form of
|
|
waveforms, there is no way to mimic the prosodic contour of the intended mes-
|
|
sage, and 3) the syllable inventory for general English is very large. There are cur-
|
|
rently no syllable-based systems for speech generation.
|
|
|
|
7.3.2.2 Demisyllables The last two disadvantages of a syllable-based scheme
|
|
might be overcome by replacing syllables by demisyllables. The demisyllable is
|
|
defined as half of a syllable, either the set of initial consonants plus half of the
|
|
vowel, or the second half of the vowel plus any postvocalic consonants (Fujimura
|
|
and Lovins, 1978; Lovins and Fujimura, 1976). For example, the word “construct”
|
|
would be divided into co-, <on, stru-, and -uct. It is claimed that there are less
|
|
than 1000 demisyllables needed to synthesize any English utterance. Each
|
|
demisyllable can be represented in terms of a set of linear prediction frames. Con-
|
|
catenation rules include some smoothing across demisyllable boundaries. The
|
|
problems with demisyllable-based approaches are: 1) how to smooth across
|
|
demisyllable boundaries to simulate natural coarticulation, and 2) how to adjust
|
|
durations to match the desired pattern for a sentence. The latter problem is serious
|
|
|
|
76
|