from-text-to-speech-the-mit.../pages-txt/052.txt

4

The phrase-level parser

M

4.1 Overview

The parser for the text-to-speech system is designed to satisfy a unique set of con-
straints. It must be able to handle arbitrary text quickly, but does not need to
derive semantic information. Many parsers attempt to build a deep structure parse
from the input sentence so that semantic information may be derived for such uses
as question-answering systems. The text-to-speech parser supplies a surface struc-
ture parse, providing information for algorithms which produce prosodic effects in
the output speech. In addition, some clause boundaries are set according to rules
described in Chapter 8. These phrase-level and clause-level structures provide
much of the syntactic information needed by the present prosodic algorithms.

It is well known that parsing systems which parse unrestricted text often
produce numerous ambiguous or failed parses. Although it is always possible to
choose arbitrarily among ambiguous parsings, a failed parse is unacceptable in the
text-to-speech system. When one examines ambiguous results from full sentence-
level parsers, one finds that the bottom level of nodes (i.e. the phrase nodes) are
often invariant among the competing interpretations; the ambiguities arise from
possible groupings of these nodes at the clause level, especially for parsers which
build binary trees. One also finds that for many failed parses, much of the struc-
ture at the phrase level has been correctly determined. The phrase-level parser
takes advantage of this reliability, producing as many phrase nodes as possible for
use by the MITalk prosodic component.

The phrase-level parser uses comparatively few resources and runs in real-
time. This is quite unusual for parsers which handle unrestricted text, but is neces-
sary for a text-to-speech system. It would not be possible in such a practical sys-
tem to allocate the resources needed for recursion in the grammar and for back-
tracking control structures. Since extensive backtracking occurs above the phrase
level for the most part, the combinatorial explosion associated with this strategy is

avoided by restriction to phrase-level parsing.
Phrase recognition is accomplished via an ATN (augmented transition

network) interpreter (Woods, 1970) and the grammars for noun groups and verb
groups. A “noun group” (NGR), as used in this grammar, means either a pronoun

40