You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
50 lines
2.2 KiB
50 lines
2.2 KiB
From text to speech: The MITalk system
|
|
|
|
cascade branch of the block diagram shown in Figure 12-6. For a male talker with
|
|
a very long vocal tract, it may be necessary to add a sixth resonator to the cascade
|
|
branch. As currently programmed, NFC can be set to 4, 5, or 6 formants in the
|
|
cascade branch. (Any change to NFC implies a change in the length of the vocal
|
|
tract, so such changes must be made with care.)
|
|
|
|
Ignoring for the moment the nasal pole resonator RNP and the nasal zero an-
|
|
tiresonator RNZ, the cascade model of Figure 12-6, consisting of five formant
|
|
resonators, has a volume velocity transfer function that can be represented in the
|
|
frequency domain as a product (Gold and Rabiner, 1968):
|
|
|
|
A(n)
|
|
T | ! 6
|
|
()= 1-B(n)z"1-C (n)z2 ©)
|
|
|
|
where the constants A(n), B(n), and C(n) are determined by the values of the nth
|
|
formant frequency F(n) and nth formant bandwidth BW(n) by the relationships
|
|
given earlier in Equation 2. The constants A(n) in the numerator of Equation 6
|
|
ensure that the transfer function has a value of unity at zero frequency, i.e., the dc
|
|
airflow is unimpeded. The magnitude of T'(f) is plotted in Figure 12-9 for several
|
|
values of formant frequencies and formant bandwidths.
|
|
|
|
12.2.2 Relationship to analog models of the vocal tract
|
|
|
|
The transfer function of the vocal tract can also be expressed in the continuous
|
|
world of differential equations. Equation 6 is then rewritten as an infinite product
|
|
of poles in the Laplace transform s-plane:
|
|
|
|
s(n)s*(n)
|
|
9= I_]l: [s+s(n)][s+s*(n)] ™)
|
|
|
|
where s=2jnf, and the constants s(n) and s*(n) are determined by the values of the
|
|
nth formant frequency F (n) and the nth formant bandwidth BW(n) by the relation-
|
|
ships:
|
|
|
|
s(n)=nBW(n)+2jrF (n)
|
|
s*(n)=nBW(n)-2jrF (n)
|
|
|
|
The two formulations 6 and 7 are exactly equivalent representations of the
|
|
transfer function for an ideal vocal tract configuration corresponding to a uniform
|
|
tube closed at the glottis and having all formant bandwidths equal to, e.g., 100 Hz.
|
|
The two formulations are indistinguishable at representing vocal tract transfer
|
|
functions below 5 kHz. However, in a practical synthesizer, the infinite product of
|
|
poles can only be approximated (e.g. by building five electronic resonators and a
|
|
higher-pole correction network (Fant, 1959)).
|
|
|
|
140
|