You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
68 lines
1.1 KiB
68 lines
1.1 KiB
Text preprocessing
|
|
|
|
e. Semicolon
|
|
|
|
f. Colon
|
|
|
|
g. Apostrophe
|
|
|
|
h. Single and double quotes
|
|
|
|
i. Ellipsis (...)
|
|
|
|
j. Percent sign
|
|
|
|
k. Ampersand
|
|
|
|
1. Parentheses
|
|
m. Brackets
|
|
|
|
n. Dashes
|
|
|
|
o. Hyphens
|
|
|
|
10. Symbols not recognizable by computer (and hence not recognized by
|
|
FORMAT), including:
|
|
|
|
a. Italics
|
|
|
|
b. Boldface
|
|
|
|
¢. Underlining
|
|
|
|
d. Superscripts and subscripts
|
|
e. Dieresis/umlaut ()
|
|
|
|
f. Cedilla (¢)
|
|
|
|
g. Various forms of special notation
|
|
|
|
2.2 Input
|
|
|
|
FORMAT accepts as input the original unrestricted English text to be analyzed.
|
|
This text is a sequence of lines of letters and symbols expressed in a computer-
|
|
readable form (in all implementations of MITalk, the ASCII character set is used).
|
|
The actual letters recognized are:
|
|
|
|
1. Uppercase and lowercase letters
|
|
|
|
2. Numeric digits
|
|
|
|
3. Period (or decimal point), question mark, and exclamation point
|
|
4. Comma, semicolon, and colon
|
|
|
|
5. Apostrophe
|
|
|
|
6. Single and double quote marks
|
|
|
|
7. Parentheses, brackets, and braces
|
|
|
|
8. Percent sign, dollar sign, and ampersand
|
|
|
|
9. Slash
|
|
Any character which is not recognized by FORMAT causes a warning message
|
|
|
|
and is treated as a space.
|
|
|
|
17
|