from-text-to-speech-the-mit.../pages-txt/093.txt

8

The phonological component

8.1 Overview

The phonological component PHONOL accepts input from the text analysis
routines (described in Chapters 2-6) and produces an output that is sent to the
prosodic component PROSOD (to be described in Chapter 9). PHONOL is
divided into two modules PHONO1 and PHONO2. PHONOI1 uses information
from the PARSER to specify the syntactic markers that influence the spoken out-
put. PHONO2 contains a set of segmental recoding rules that are activated to
select an appropriate allophone for each phoneme, and to simplify certain un-
stressed phonetic sequences. Rules for pausing are included in both PHONOI1 and
PHONOZ2. Pauses of various durations are inserted at sentence boundaries, clause
boundaries, and locations in the text of certain punctuation marks such as commas.
Some additional pauses are introduced in longer phrases and slow speaking rate so
that the talker does not seem to have an inhuman supply of breath.

8.1.1 Synthesis-by-rule

If a subset of the MITalk system is to be used as a speech-synthesis-by-rule
program by deleting the analysis modules, the preferred first module would be
PHONO2 or PROSOD. The input to the system would then be an abstract
representation containing phonemes, lexical stress symbols, and syntactic structure
symbols for each sentence to be synthesized. Applications for this mode of speech
generation by computer include cases where an abstract syntactic and phonemic
representation for each sentence is known or can be computed. Speech quality will
be better than in the text-to-speech case because analysis errors can be avoided, but
considerable linguistic sophistication is required of users. Storage requirements
for sentences are minimal -- on the order of 100 bits per sentence.

8.2 Input representation for a sentence

The input to PHONOI consists of a phonemic pronunciation for each word (i.e. as
spoken in isolation), lexical stress pattern, and syntactic information concerning
part of speech and phrasal structure. The output from PHONO1 consists of a

single string of symbols for each sentence.
The symbol inventory used in PHONO1 and PHONO?2 is shown in Table 8-1,

81