You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
51 lines
2.3 KiB
51 lines
2.3 KiB
The phrase-level parser
|
|
|
|
(e.g. him or several), a pronoun with modification (e.g. almost anything green),
|
|
an integer with or without modification (e.g. five or nearly a hundred thousand),
|
|
a noun phrase up to and including the head noun (e.g. every third car or his own
|
|
red and black car), or any of the above preceded by a preposition. A “verb
|
|
group” (VGR) consists of a verb phrase without direct or indirect objects (e.g.
|
|
could almost see, might not have been moving, had been very yellow). Another
|
|
type of group, the “verbal” (VBL) is also recognized by the verb group network; it
|
|
is either an infinitive phrase (e.g. to walk slowly, to be broken) or a participial
|
|
phrase (e.g. walking slowly, have almost given).
|
|
|
|
4.2 Input
|
|
|
|
The input file from DECOMP has been described in Chapter 3. It contains the
|
|
morph spelling, morph pronunciation, morph type, and parts of speech and features
|
|
for each homograph of each morph in the analysis of the word. A parts-of-speech
|
|
set for the entire word is also supplied.
|
|
|
|
4.3 Output
|
|
|
|
The output of the parser is a series of nodes representing either a parsed constituent
|
|
(i.e. a phrase), or a word (or punctuation mark) which was not included in a phrase
|
|
by the parser. Each node representing a phrase contains the words covered by that
|
|
phrase in the order in which they appear in the text. The output file contains the
|
|
following information: |
|
|
|
|
1. For each node, the number of words covered by the node, the part of
|
|
speech (type of constituent) of the node, and a property list is given.
|
|
The property list is a set of attribute-value pairs.
|
|
|
|
2. Each word is accompanied by its spelling and a part-of-speech set.
|
|
Only one part of speech is given for those words covered by a node.
|
|
|
|
3. For each part of speech, a property list is given.
|
|
|
|
4.4 Parts of speech
|
|
|
|
4.4.1 The standard parts of speech in the lexicon
|
|
|
|
The following are the parts of speech of open class words and words which do not
|
|
have any special syntactic or prosodic features. Those names in uppercase are the
|
|
parts of speech, attributes, and attribute values as they are listed in the source ver-
|
|
sion of the lexicon. A word itself may have any number of parts of speech. The
|
|
designations TR and FL are abbreviations for “true” and “false”, respectively.
|
|
|
|
NOUN (NUM SING) = singular noun
|
|
NOUN (NUM PL) = plural noun
|
|
|
|
41
|