You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

50 lines
1.4 KiB

Text preprocessing
2.4.2 Words, abbreviations, and special symbols
FORMAT recognizes a word as an alphabetic string delimited by a punctuation or
whitespace character (the newline character which separates lines is considered to
be whitespace). If a word is followed by a period, then FORMAT looks in a table
of abbreviations to see if a translation is specified for that word. Table 2-1 shows
the abbreviation table currently in use. If a translation is found, then the translated
word(s) are output in place of the original abbreviation.
Table 2-1: Abbreviation translations performed by FORMAT
MIZ
Ms -
Mr - MISTER
Mrs - MIZZES
Dr - DOCTOR
Num - NUMBER
Jan - JANUARY
Feb - FEBRUARY
Mar - MARCH
Apr - APRIL
Aug - AUGUST
Sept - SEPTEMBER
Oct - OCTOBER
Nov - NOVEMBER
Dec - DECEMBER
etc - ET CETERA
Jr - JUNIOR
Prof - PROFESSOR
A word that is in capital letters, or which contains digits as well as letters, is
considered to be a symbol and is translated by pronouncing each character
separately (e.g. for USA and MIT). When a letter is to be pronounced, it is
represented by a special noun morph which has the proper pronunciation for the
letter (e.g. A—LETTER-A). A word that is in lowercase, or which has only the
first letter capitalized, is simply converted to uppercase and output.
2.4.3 Apostrophes and single quotation marks
If an apostrophe is embedded in a word, then the entire word is output as a unit.
19