You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
50 lines
1.4 KiB
50 lines
1.4 KiB
Text preprocessing
|
|
|
|
2.4.2 Words, abbreviations, and special symbols
|
|
|
|
FORMAT recognizes a word as an alphabetic string delimited by a punctuation or
|
|
whitespace character (the newline character which separates lines is considered to
|
|
be whitespace). If a word is followed by a period, then FORMAT looks in a table
|
|
of abbreviations to see if a translation is specified for that word. Table 2-1 shows
|
|
the abbreviation table currently in use. If a translation is found, then the translated
|
|
word(s) are output in place of the original abbreviation.
|
|
|
|
Table 2-1: Abbreviation translations performed by FORMAT
|
|
|
|
MIZ
|
|
|
|
Ms -
|
|
|
|
Mr - MISTER
|
|
|
|
Mrs - MIZZES
|
|
|
|
Dr - DOCTOR
|
|
Num - NUMBER
|
|
Jan - JANUARY
|
|
Feb - FEBRUARY
|
|
Mar - MARCH
|
|
|
|
Apr - APRIL
|
|
|
|
Aug - AUGUST
|
|
Sept - SEPTEMBER
|
|
Oct - OCTOBER
|
|
Nov - NOVEMBER
|
|
Dec - DECEMBER
|
|
etc - ET CETERA
|
|
Jr - JUNIOR
|
|
Prof - PROFESSOR
|
|
|
|
A word that is in capital letters, or which contains digits as well as letters, is
|
|
considered to be a symbol and is translated by pronouncing each character
|
|
separately (e.g. for USA and MIT). When a letter is to be pronounced, it is
|
|
represented by a special noun morph which has the proper pronunciation for the
|
|
letter (e.g. A—LETTER-A). A word that is in lowercase, or which has only the
|
|
first letter capitalized, is simply converted to uppercase and output.
|
|
|
|
2.4.3 Apostrophes and single quotation marks
|
|
If an apostrophe is embedded in a word, then the entire word is output as a unit.
|
|
|
|
19
|