You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
55 lines
2.3 KiB
55 lines
2.3 KiB
From text to speech: The MITalk system
|
|
|
|
The apostrophe is also included in the word if it appears after the last letter in the
|
|
word and that last letter is an s. An apostrophe in any other position is considered
|
|
to be a single quotation mark and is output as a punctuation character.
|
|
|
|
2.4.4 Hyphens and dashes
|
|
|
|
If a dash character is embedded between two words, it is considered to be a hyphen
|
|
separating compound word elements. In the current implementation, the hyphen is
|
|
deleted and the compound is treated as two separate words (e.g. two-layer —
|
|
TWO LAYER). This solution prevents the correct stress pattern from being
|
|
placed on a hyphenated compound, but, on the other hand, it prevents incorrect
|
|
decompositions which might result from simply concatenating the two roots at this
|
|
point.
|
|
|
|
If a dash appears at the end of the last word on a line, then the dash is con-
|
|
sidered to be a word-splitting hyphen. In this case, FORMAT deletes the hyphen
|
|
from the end of the current word and appends to that word the first word on the
|
|
next line. This rule reassembles words which are divided at the end of a line on a
|
|
|
|
syllable boundary.
|
|
An isolated dash is output as a punctuation character and eventually becomes
|
|
|
|
a pause. A string of dashes (isolated or embedded) is converted to a single dash
|
|
and output as punctuation.
|
|
|
|
2.4.5 Special symbols
|
|
A percent sign (%) is replaced by the words PER CENT. An ampersand (&) is
|
|
|
|
replaced by the word AND.
|
|
|
|
2.4.6 Numerals
|
|
|
|
FORMAT recognizes a number as a string of digits with optional commas and/or a
|
|
period (decimal point). There are two ways of pronouncing numbers: each digit in
|
|
sequence (e.g. 75— SEVEN FIVE), and in decimal form (e.g. 75— SEVENTY
|
|
FIVE). FORMAT selects the appropriate type of pronunciation based on the form
|
|
and context of the number.
|
|
|
|
2.4.6.1 Integers, commas, and decimal points A complete number consists of a
|
|
set of comma-separated digit triads (the integer portion), optionally followed by a
|
|
decimal point and fraction digits. The integer portion is pronounced by pronounc-
|
|
ing each triad from left to right and appending the appropriate multiplying word to
|
|
each triad (e.g. BILLION, MILLION, THOUSAND, or nothing for the rightmost
|
|
triad).
|
|
|
|
A triad is pronounced as follows:
|
|
|
|
o If the left digit is nonzero, then it is pronounced followed by the word
|
|
|
|
HUNDRED.
|
|
|
|
20
|