You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
55 lines
2.1 KiB
55 lines
2.1 KiB
From text to speech: The MITalk system
|
|
|
|
The size of individual words and sentences is limited, but set at a high value
|
|
to include all reasonable cases. Words are allowed 40 characters each, and the
|
|
maximum number of words per sentence is 200. If the limit of 40 characters per
|
|
word is exceeded, the word is truncated and a message indicating the problem and
|
|
number of allowable characters per word is printed for the user.
|
|
|
|
2.3 Output
|
|
|
|
The output of FORMAT is a sequence of words and punctuation marks.
|
|
FORMAT scans each input line from left to right and converts each recognized
|
|
construct (word, number, symbol, etc.) into an appropriate word or sequence of
|
|
words. Since case is not significant in the later modules of MITalk, each word is
|
|
written in all uppercase letters.
|
|
|
|
An example of input and output is shown here in Figure 2-1. (Input text is in
|
|
boldface.)
|
|
|
|
Mr. Jones gets 35.3%.
|
|
FORMAT: MISTER
|
|
FORMAT: JONES
|
|
FORMAT: GETS
|
|
FORMAT: THIRTY
|
|
FORMAT: FIVE
|
|
FORMAT: POINT
|
|
FORMAT: THREE
|
|
FORMAT: PERCENT
|
|
FORMAT: .
|
|
FORMAT: .
|
|
|
|
Figure 2-1: Example of FORMAT processing
|
|
|
|
2.4 Formatting operations
|
|
The various translations performed by FORMAT are described in detail below.
|
|
|
|
2.4.1 Paragraphs and sentences
|
|
|
|
Whitespace (i.e. spaces and/or tabs) at the beginning of a line followed by a capi-
|
|
talized word is taken to denote the beginning of a paragraph. FORMAT translates
|
|
this whitespace into a period (.) which later gets translated into a pause.
|
|
|
|
An additional pause is inserted after each sentence longer than five words
|
|
(also after each group of short sentences longer than five words). As with the
|
|
paragraph beginning, this pause is effected by adding an extra period after the sen-
|
|
tence. This emulates a human speaker pausing for breath every so often.
|
|
|
|
The end of a sentence is delimited by a period, question mark, or exclamation
|
|
point. Not all periods denote the end of a sentence, however. If a period ends an
|
|
abbreviation, then it is only taken as an end-of-sentence marker if it is at the end of
|
|
a line and if it is followed by whitespace and a capitalized word. A period inside a
|
|
numeric string is considered to be a decimal point, of course.
|
|
|
|
18
|