from-text-to-speech-the-mit.../pages-txt/152.txt

From text to speech: The MITalk system

cascade branch of the block diagram shown in Figure 12-6. For a male talker with
a very long vocal tract, it may be necessary to add a sixth resonator to the cascade
branch. As currently programmed, NFC can be set to 4, 5, or 6 formants in the
cascade branch. (Any change to NFC implies a change in the length of the vocal
tract, so such changes must be made with care.)

Ignoring for the moment the nasal pole resonator RNP and the nasal zero an-
tiresonator RNZ, the cascade model of Figure 12-6, consisting of five formant
resonators, has a volume velocity transfer function that can be represented in the
frequency domain as a product (Gold and Rabiner, 1968):

A(n)
T | ! 6
()= 1-B(n)z"1-C (n)z2 ©)

where the constants A(n), B(n), and C(n) are determined by the values of the nth
formant frequency F(n) and nth formant bandwidth BW(n) by the relationships
given earlier in Equation 2. The constants A(n) in the numerator of Equation 6
ensure that the transfer function has a value of unity at zero frequency, i.e., the dc
airflow is unimpeded. The magnitude of T'(f) is plotted in Figure 12-9 for several
values of formant frequencies and formant bandwidths.

12.2.2 Relationship to analog models of the vocal tract

The transfer function of the vocal tract can also be expressed in the continuous
world of differential equations. Equation 6 is then rewritten as an infinite product
of poles in the Laplace transform s-plane:

s(n)s*(n)
9= I_]l: [s+s(n)][s+s*(n)] ™)

where s=2jnf, and the constants s(n) and s*(n) are determined by the values of the
nth formant frequency F (n) and the nth formant bandwidth BW(n) by the relation-
ships:

s(n)=nBW(n)+2jrF (n)
s*(n)=nBW(n)-2jrF (n)

The two formulations 6 and 7 are exactly equivalent representations of the
transfer function for an ideal vocal tract configuration corresponding to a uniform
tube closed at the glottis and having all formant bandwidths equal to, e.g., 100 Hz.
The two formulations are indistinguishable at representing vocal tract transfer
functions below 5 kHz. However, in a practical synthesizer, the infinite product of
poles can only be approximated (e.g. by building five electronic resonators and a
higher-pole correction network (Fant, 1959)).

140