from-text-to-speech-the-mit.../pages-txt/105.txt

9

The prosodic component

9.1 Overview

The sentence representation produced by the phonological component PHONO2
serves as input to the prosodic component PROSOD that is to be described in this
chapter. An example of the input to the prosodic component and the output
generated by the prosodic rules is shown in Figure 9-1. The output consists of a
string of phonetic segments, with each segment assigned a stress feature and a
duration in msec. The fundamental frequency targets which appear in the
PROSOD output listing are generated by an obsolete algorithm and are discarded
by FOTARG which then generates the proper FO targets.

9.2 Segmental durations

In a review of the factors that influence segmental durations in spoken English sen-
tences (Klatt, 1976b and references cited therein), it was concluded that only a few
of the many rule-governed durational changes are large enough to be perceptually
discriminable. The goal of the rule system described below and in Klatt (1979b) is
to characterize these perceptually important first-order effects.

The durational definitions that have been adopted include the closure for a
stop (any burst and aspiration at release are assumed to be a part of the following
segment). For fricatives, the duration corresponds to the interval of visible frica-
tion noise (or to changes in the voicing source if no frication is visible). For
sonorant sequences, the segmental boundary is defined to be the half-way point in
the formant transition for that formant having the greatest extent of transition.
These definitions lead to a convenient and largely reproducible measurement pro-
cedure, but the physiological and perceptual validity of these boundaries have not
been established.

Each segment is assigned a duration by a set of rules presented in detail
below. The rules are intended to match observed durations for a single speaker
(DHK) reading paragraph-length materials. The rules operate within the
framework of a model of durational behavior which states that: 1) each rule tries to
effect a percentage increase or decrease in the duration of the segment, but 2) seg-
ments cannot be compressed shorter than a certain minimum duration (Klatt,
1973). The model is summarized by the formula:

93