from-text-to-speech-the-mit.../pages-txt/181.txt

Some measures of intelligibility and comprehension

are spelled out completely by this module. For example, “19th” was pronounced
as “one-nine-T-H” on output. In words such as “19th” or “100-yard”, the al-
phabetic and numeric sections are separable and could be pronounced. However,
in a true alphanumeric such as “103S” or “a3c”, it is correct to spell out all of the
symbols.

A number of errors were also detected in the module DECOMP, which is
responsible for decomposing words into morphs by reference to the morph lexicon.
In several cases, the wrong morphs were identified, resulting in perceptible seg-
mental errors in the speech output. In other cases, the correct morphs were ob-
tained, but the stress assignment of the constituent morphs was different for the
morphs in isolation than for the morphs when concatenated in a polymorphemic
word. We also identified several words that should have been in the lexicon since
their pronunciation could not be handled by the existing spelling-to-sound rules.

Several errors in the operation of the spelling-to-sound rules were also
detected. These errors resulted in the wrong pronunciation, which was quite
noticeable in listening. For example, the second syllable of the word “Britain” was
pronounced like the second syllable in the word “maintain™.

In a number of other cases, we were able to identify problems in the operation
of the parser, particularly in recognizing the correct part of speech. For example,
the word “close” can be either an adjective or verb, each with a different pronun-
ciation. Several problems were also observed with the word “affect”, which can be
either a noun or a verb. In each of these cases, the part of speech was incorrectly
identified by the parser, resulting in the wrong choice in pronunciation on output.

Finally, there were several cases, especially with the Haskins anomalous sen-
tences, in which the parser incorrectly assigned the verb (which could also be a
noun) to the previous noun phrase. This error is not surprising since the parser has
a basic preference for noun phrases ényway, when a choice is available. However,
this often produced inappropriate sentence stress resulting from incorrect pitch and

segmental durations. In some cases, these differences could be readily observed,
whereas in others, the effects were substantially more difficult to detect even with

careful and repeated listening. These observations are consistent with an earlier
perceptual study of the durational rules carried out by Carlson et al. (1979). They
found that a deletion of a phrase boundary produced only negligible effects on
listeners’ evaluations of the naturalness of synthetic speech.

Some of the errors described above are considered to be relatively minor and
can be corrected rather easily by the simple addition of polymorphemic entries in
the morph lexicon. Since this evaluation was completed, a “pre-parser” has been

169