You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

68 lines
1.1 KiB

Text preprocessing
e. Semicolon
f. Colon
g. Apostrophe
h. Single and double quotes
i. Ellipsis (...)
j. Percent sign
k. Ampersand
1. Parentheses
m. Brackets
n. Dashes
o. Hyphens
10. Symbols not recognizable by computer (and hence not recognized by
FORMAT), including:
a. Italics
b. Boldface
¢. Underlining
d. Superscripts and subscripts
e. Dieresis/umlaut ()
f. Cedilla (¢)
g. Various forms of special notation
2.2 Input
FORMAT accepts as input the original unrestricted English text to be analyzed.
This text is a sequence of lines of letters and symbols expressed in a computer-
readable form (in all implementations of MITalk, the ASCII character set is used).
The actual letters recognized are:
1. Uppercase and lowercase letters
2. Numeric digits
3. Period (or decimal point), question mark, and exclamation point
4. Comma, semicolon, and colon
5. Apostrophe
6. Single and double quote marks
7. Parentheses, brackets, and braces
8. Percent sign, dollar sign, and ampersand
9. Slash
Any character which is not recognized by FORMAT causes a warning message
and is treated as a space.
17