You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

55 lines
2.1 KiB

From text to speech: The MITalk system
The size of individual words and sentences is limited, but set at a high value
to include all reasonable cases. Words are allowed 40 characters each, and the
maximum number of words per sentence is 200. If the limit of 40 characters per
word is exceeded, the word is truncated and a message indicating the problem and
number of allowable characters per word is printed for the user.
2.3 Output
The output of FORMAT is a sequence of words and punctuation marks.
FORMAT scans each input line from left to right and converts each recognized
construct (word, number, symbol, etc.) into an appropriate word or sequence of
words. Since case is not significant in the later modules of MITalk, each word is
written in all uppercase letters.
An example of input and output is shown here in Figure 2-1. (Input text is in
boldface.)
Mr. Jones gets 35.3%.
FORMAT: MISTER
FORMAT: JONES
FORMAT: GETS
FORMAT: THIRTY
FORMAT: FIVE
FORMAT: POINT
FORMAT: THREE
FORMAT: PERCENT
FORMAT: .
FORMAT: .
Figure 2-1: Example of FORMAT processing
2.4 Formatting operations
The various translations performed by FORMAT are described in detail below.
2.4.1 Paragraphs and sentences
Whitespace (i.e. spaces and/or tabs) at the beginning of a line followed by a capi-
talized word is taken to denote the beginning of a paragraph. FORMAT translates
this whitespace into a period (.) which later gets translated into a pause.
An additional pause is inserted after each sentence longer than five words
(also after each group of short sentences longer than five words). As with the
paragraph beginning, this pause is effected by adding an extra period after the sen-
tence. This emulates a human speaker pausing for breath every so often.
The end of a sentence is delimited by a period, question mark, or exclamation
point. Not all periods denote the end of a sentence, however. If a period ends an
abbreviation, then it is only taken as an end-of-sentence marker if it is at the end of
a line and if it is followed by whitespace and a capitalized word. A period inside a
numeric string is considered to be a decimal point, of course.
18