You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

55 lines
2.3 KiB

From text to speech: The MITalk system
The apostrophe is also included in the word if it appears after the last letter in the
word and that last letter is an s. An apostrophe in any other position is considered
to be a single quotation mark and is output as a punctuation character.
2.4.4 Hyphens and dashes
If a dash character is embedded between two words, it is considered to be a hyphen
separating compound word elements. In the current implementation, the hyphen is
deleted and the compound is treated as two separate words (e.g. two-layer —
TWO LAYER). This solution prevents the correct stress pattern from being
placed on a hyphenated compound, but, on the other hand, it prevents incorrect
decompositions which might result from simply concatenating the two roots at this
point.
If a dash appears at the end of the last word on a line, then the dash is con-
sidered to be a word-splitting hyphen. In this case, FORMAT deletes the hyphen
from the end of the current word and appends to that word the first word on the
next line. This rule reassembles words which are divided at the end of a line on a
syllable boundary.
An isolated dash is output as a punctuation character and eventually becomes
a pause. A string of dashes (isolated or embedded) is converted to a single dash
and output as punctuation.
2.4.5 Special symbols
A percent sign (%) is replaced by the words PER CENT. An ampersand (&) is
replaced by the word AND.
2.4.6 Numerals
FORMAT recognizes a number as a string of digits with optional commas and/or a
period (decimal point). There are two ways of pronouncing numbers: each digit in
sequence (e.g. 75— SEVEN FIVE), and in decimal form (e.g. 75— SEVENTY
FIVE). FORMAT selects the appropriate type of pronunciation based on the form
and context of the number.
2.4.6.1 Integers, commas, and decimal points A complete number consists of a
set of comma-separated digit triads (the integer portion), optionally followed by a
decimal point and fraction digits. The integer portion is pronounced by pronounc-
ing each triad from left to right and appending the appropriate multiplying word to
each triad (e.g. BILLION, MILLION, THOUSAND, or nothing for the rightmost
triad).
A triad is pronounced as follows:
o If the left digit is nonzero, then it is pronounced followed by the word
HUNDRED.
20