You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

51 lines
2.3 KiB

The phrase-level parser
(e.g. him or several), a pronoun with modification (e.g. almost anything green),
an integer with or without modification (e.g. five or nearly a hundred thousand),
a noun phrase up to and including the head noun (e.g. every third car or his own
red and black car), or any of the above preceded by a preposition. A “verb
group” (VGR) consists of a verb phrase without direct or indirect objects (e.g.
could almost see, might not have been moving, had been very yellow). Another
type of group, the “verbal” (VBL) is also recognized by the verb group network; it
is either an infinitive phrase (e.g. to walk slowly, to be broken) or a participial
phrase (e.g. walking slowly, have almost given).
4.2 Input
The input file from DECOMP has been described in Chapter 3. It contains the
morph spelling, morph pronunciation, morph type, and parts of speech and features
for each homograph of each morph in the analysis of the word. A parts-of-speech
set for the entire word is also supplied.
4.3 Output
The output of the parser is a series of nodes representing either a parsed constituent
(i.e. a phrase), or a word (or punctuation mark) which was not included in a phrase
by the parser. Each node representing a phrase contains the words covered by that
phrase in the order in which they appear in the text. The output file contains the
following information: |
1. For each node, the number of words covered by the node, the part of
speech (type of constituent) of the node, and a property list is given.
The property list is a set of attribute-value pairs.
2. Each word is accompanied by its spelling and a part-of-speech set.
Only one part of speech is given for those words covered by a node.
3. For each part of speech, a property list is given.
4.4 Parts of speech
4.4.1 The standard parts of speech in the lexicon
The following are the parts of speech of open class words and words which do not
have any special syntactic or prosodic features. Those names in uppercase are the
parts of speech, attributes, and attribute values as they are listed in the source ver-
sion of the lexicon. A word itself may have any number of parts of speech. The
designations TR and FL are abbreviations for “true” and “false”, respectively.
NOUN (NUM SING) = singular noun
NOUN (NUM PL) = plural noun
41