commit 059f7d8baab5aca59a09c107f8d6e30b120afee4
Author: Tait Hoyem <tait@tait.tech>
Date:   Sun Mar 17 11:49:04 2024 -0600

    Initial commit

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..cbe5ad1
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,437 @@
+Attribution-NonCommercial-ShareAlike 4.0 International
+
+=======================================================================
+
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+
+Using Creative Commons Public Licenses
+
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+    wiki.creativecommons.org/Considerations_for_licensors
+
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More considerations
+     for the public:
+    wiki.creativecommons.org/Considerations_for_licensees
+
+=======================================================================
+
+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
+Public License
+
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial-ShareAlike 4.0 International Public License
+("Public License"). To the extent this Public License may be
+interpreted as a contract, You are granted the Licensed Rights in
+consideration of Your acceptance of these terms and conditions, and the
+Licensor grants You such rights in consideration of benefits the
+Licensor receives from making the Licensed Material available under
+these terms and conditions.
+
+
+Section 1 -- Definitions.
+
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+
+  c. BY-NC-SA Compatible License means a license listed at
+     creativecommons.org/compatiblelicenses, approved by Creative
+     Commons as essentially the equivalent of this Public License.
+
+  d. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+
+  e. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+
+  f. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+
+  g. License Elements means the license attributes listed in the name
+     of a Creative Commons Public License. The License Elements of this
+     Public License are Attribution, NonCommercial, and ShareAlike.
+
+  h. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+
+  i. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+
+  j. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+
+  k. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+
+  l. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+
+  m. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+
+  n. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+
+
+Section 2 -- Scope.
+
+  a. License grant.
+
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+
+       5. Downstream recipients.
+
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+
+            b. Additional offer from the Licensor -- Adapted Material.
+               Every recipient of Adapted Material from You
+               automatically receives an offer from the Licensor to
+               exercise the Licensed Rights in the Adapted Material
+               under the conditions of the Adapter's License You apply.
+
+            c. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+
+  b. Other rights.
+
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+
+
+Section 3 -- License Conditions.
+
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+
+  a. Attribution.
+
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+
+                ii. a copyright notice;
+
+               iii. a notice that refers to this Public License;
+
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+
+  b. ShareAlike.
+
+     In addition to the conditions in Section 3(a), if You Share
+     Adapted Material You produce, the following conditions also apply.
+
+       1. The Adapter's License You apply must be a Creative Commons
+          license with the same License Elements, this version or
+          later, or a BY-NC-SA Compatible License.
+
+       2. You must include the text of, or the URI or hyperlink to, the
+          Adapter's License You apply. You may satisfy this condition
+          in any reasonable manner based on the medium, means, and
+          context in which You Share Adapted Material.
+
+       3. You may not offer or impose any additional or different terms
+          or conditions on, or apply any Effective Technological
+          Measures to, Adapted Material that restrict exercise of the
+          rights granted under the Adapter's License You apply.
+
+
+Section 4 -- Sui Generis Database Rights.
+
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material,
+     including for purposes of Section 3(b); and
+
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+
+
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+
+
+Section 6 -- Term and Termination.
+
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+
+       2. upon express reinstatement by the Licensor.
+
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+
+
+Section 7 -- Other Terms and Conditions.
+
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+
+
+Section 8 -- Interpretation.
+
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+
+=======================================================================
+
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+
+Creative Commons may be contacted at creativecommons.org.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..b71698b
--- /dev/null
+++ b/README.md
@@ -0,0 +1,10 @@
+# From Text to Speech: The MITalk System
+
+- A (gradual) reproduction of the book _From Text to Speech: The MITalk System"_.
+    - The book is not available in EPUB format.
+    - All PDF formats are scanned images from the book itself.
+    - This is a reproduction so that blind individuals may have something to study related to text-to-speech systems.
+- Exemption from copyright in Canada: [Copyright Act, Section III, "Persons with Perceptual Disabilities"](https://laws-lois.justice.gc.ca/eng/acts/C-42/page-10.html#h-103789)
+- License: CC-BY-SA-NC
+- Used `tesseract` to convert all `.jpg` and `.png` files into `.txt` files in the `pages/` directory.
+- Original images extracted from the PDF here for reference, but will be deleted upon completion.
diff --git a/pages-images/000.jpg b/pages-images/000.jpg
new file mode 100644
index 0000000..1af4c42
Binary files /dev/null and b/pages-images/000.jpg differ
diff --git a/pages-images/001.png b/pages-images/001.png
new file mode 100644
index 0000000..77fc1b6
Binary files /dev/null and b/pages-images/001.png differ
diff --git a/pages-images/002.png b/pages-images/002.png
new file mode 100644
index 0000000..907ac80
Binary files /dev/null and b/pages-images/002.png differ
diff --git a/pages-images/003.png b/pages-images/003.png
new file mode 100644
index 0000000..84e9a21
Binary files /dev/null and b/pages-images/003.png differ
diff --git a/pages-images/004.png b/pages-images/004.png
new file mode 100644
index 0000000..ba10358
Binary files /dev/null and b/pages-images/004.png differ
diff --git a/pages-images/005.png b/pages-images/005.png
new file mode 100644
index 0000000..3667c9d
Binary files /dev/null and b/pages-images/005.png differ
diff --git a/pages-images/006.png b/pages-images/006.png
new file mode 100644
index 0000000..43d8fbe
Binary files /dev/null and b/pages-images/006.png differ
diff --git a/pages-images/007.png b/pages-images/007.png
new file mode 100644
index 0000000..561e9d8
Binary files /dev/null and b/pages-images/007.png differ
diff --git a/pages-images/008.png b/pages-images/008.png
new file mode 100644
index 0000000..6cd8104
Binary files /dev/null and b/pages-images/008.png differ
diff --git a/pages-images/009.png b/pages-images/009.png
new file mode 100644
index 0000000..cb24223
Binary files /dev/null and b/pages-images/009.png differ
diff --git a/pages-images/010.png b/pages-images/010.png
new file mode 100644
index 0000000..e31c667
Binary files /dev/null and b/pages-images/010.png differ
diff --git a/pages-images/011.png b/pages-images/011.png
new file mode 100644
index 0000000..2804018
Binary files /dev/null and b/pages-images/011.png differ
diff --git a/pages-images/012.png b/pages-images/012.png
new file mode 100644
index 0000000..93ba5a2
Binary files /dev/null and b/pages-images/012.png differ
diff --git a/pages-images/013.png b/pages-images/013.png
new file mode 100644
index 0000000..0c0d72d
Binary files /dev/null and b/pages-images/013.png differ
diff --git a/pages-images/014.png b/pages-images/014.png
new file mode 100644
index 0000000..eda4dfa
Binary files /dev/null and b/pages-images/014.png differ
diff --git a/pages-images/015.png b/pages-images/015.png
new file mode 100644
index 0000000..6d4cc7a
Binary files /dev/null and b/pages-images/015.png differ
diff --git a/pages-images/016.png b/pages-images/016.png
new file mode 100644
index 0000000..8b58dd3
Binary files /dev/null and b/pages-images/016.png differ
diff --git a/pages-images/017.png b/pages-images/017.png
new file mode 100644
index 0000000..f7b355c
Binary files /dev/null and b/pages-images/017.png differ
diff --git a/pages-images/018.png b/pages-images/018.png
new file mode 100644
index 0000000..3c3ec5d
Binary files /dev/null and b/pages-images/018.png differ
diff --git a/pages-images/019.png b/pages-images/019.png
new file mode 100644
index 0000000..6aceeb0
Binary files /dev/null and b/pages-images/019.png differ
diff --git a/pages-images/020.png b/pages-images/020.png
new file mode 100644
index 0000000..4a88f81
Binary files /dev/null and b/pages-images/020.png differ
diff --git a/pages-images/021.png b/pages-images/021.png
new file mode 100644
index 0000000..aec9d87
Binary files /dev/null and b/pages-images/021.png differ
diff --git a/pages-images/022.png b/pages-images/022.png
new file mode 100644
index 0000000..f70c27d
Binary files /dev/null and b/pages-images/022.png differ
diff --git a/pages-images/023.png b/pages-images/023.png
new file mode 100644
index 0000000..3f557ab
Binary files /dev/null and b/pages-images/023.png differ
diff --git a/pages-images/024.png b/pages-images/024.png
new file mode 100644
index 0000000..9110f0a
Binary files /dev/null and b/pages-images/024.png differ
diff --git a/pages-images/025.png b/pages-images/025.png
new file mode 100644
index 0000000..6baf32a
Binary files /dev/null and b/pages-images/025.png differ
diff --git a/pages-images/026.png b/pages-images/026.png
new file mode 100644
index 0000000..fb4e096
Binary files /dev/null and b/pages-images/026.png differ
diff --git a/pages-images/027.png b/pages-images/027.png
new file mode 100644
index 0000000..1a14859
Binary files /dev/null and b/pages-images/027.png differ
diff --git a/pages-images/028.png b/pages-images/028.png
new file mode 100644
index 0000000..17dd39f
Binary files /dev/null and b/pages-images/028.png differ
diff --git a/pages-images/029.png b/pages-images/029.png
new file mode 100644
index 0000000..44ab3f2
Binary files /dev/null and b/pages-images/029.png differ
diff --git a/pages-images/030.png b/pages-images/030.png
new file mode 100644
index 0000000..340a92b
Binary files /dev/null and b/pages-images/030.png differ
diff --git a/pages-images/031.png b/pages-images/031.png
new file mode 100644
index 0000000..3198555
Binary files /dev/null and b/pages-images/031.png differ
diff --git a/pages-images/032.png b/pages-images/032.png
new file mode 100644
index 0000000..978a51a
Binary files /dev/null and b/pages-images/032.png differ
diff --git a/pages-images/033.png b/pages-images/033.png
new file mode 100644
index 0000000..40b65d4
Binary files /dev/null and b/pages-images/033.png differ
diff --git a/pages-images/034.png b/pages-images/034.png
new file mode 100644
index 0000000..5eaaa51
Binary files /dev/null and b/pages-images/034.png differ
diff --git a/pages-images/035.png b/pages-images/035.png
new file mode 100644
index 0000000..a8d2ed5
Binary files /dev/null and b/pages-images/035.png differ
diff --git a/pages-images/036.png b/pages-images/036.png
new file mode 100644
index 0000000..4df5cb2
Binary files /dev/null and b/pages-images/036.png differ
diff --git a/pages-images/037.png b/pages-images/037.png
new file mode 100644
index 0000000..d75f80b
Binary files /dev/null and b/pages-images/037.png differ
diff --git a/pages-images/038.png b/pages-images/038.png
new file mode 100644
index 0000000..3440a30
Binary files /dev/null and b/pages-images/038.png differ
diff --git a/pages-images/039.png b/pages-images/039.png
new file mode 100644
index 0000000..79f7d0e
Binary files /dev/null and b/pages-images/039.png differ
diff --git a/pages-images/040.png b/pages-images/040.png
new file mode 100644
index 0000000..55f6df5
Binary files /dev/null and b/pages-images/040.png differ
diff --git a/pages-images/041.png b/pages-images/041.png
new file mode 100644
index 0000000..bf7f9a7
Binary files /dev/null and b/pages-images/041.png differ
diff --git a/pages-images/042.png b/pages-images/042.png
new file mode 100644
index 0000000..6b4cf1e
Binary files /dev/null and b/pages-images/042.png differ
diff --git a/pages-images/043.png b/pages-images/043.png
new file mode 100644
index 0000000..d82125b
Binary files /dev/null and b/pages-images/043.png differ
diff --git a/pages-images/044.png b/pages-images/044.png
new file mode 100644
index 0000000..4af5d4d
Binary files /dev/null and b/pages-images/044.png differ
diff --git a/pages-images/045.png b/pages-images/045.png
new file mode 100644
index 0000000..a0cd94e
Binary files /dev/null and b/pages-images/045.png differ
diff --git a/pages-images/046.png b/pages-images/046.png
new file mode 100644
index 0000000..e8dcd69
Binary files /dev/null and b/pages-images/046.png differ
diff --git a/pages-images/047.png b/pages-images/047.png
new file mode 100644
index 0000000..cfdbefe
Binary files /dev/null and b/pages-images/047.png differ
diff --git a/pages-images/048.png b/pages-images/048.png
new file mode 100644
index 0000000..5f514c3
Binary files /dev/null and b/pages-images/048.png differ
diff --git a/pages-images/049.png b/pages-images/049.png
new file mode 100644
index 0000000..3408afd
Binary files /dev/null and b/pages-images/049.png differ
diff --git a/pages-images/050.png b/pages-images/050.png
new file mode 100644
index 0000000..ffbb82c
Binary files /dev/null and b/pages-images/050.png differ
diff --git a/pages-images/051.png b/pages-images/051.png
new file mode 100644
index 0000000..7a74b65
Binary files /dev/null and b/pages-images/051.png differ
diff --git a/pages-images/052.png b/pages-images/052.png
new file mode 100644
index 0000000..7b98150
Binary files /dev/null and b/pages-images/052.png differ
diff --git a/pages-images/053.png b/pages-images/053.png
new file mode 100644
index 0000000..f61ae06
Binary files /dev/null and b/pages-images/053.png differ
diff --git a/pages-images/054.png b/pages-images/054.png
new file mode 100644
index 0000000..f35e8ac
Binary files /dev/null and b/pages-images/054.png differ
diff --git a/pages-images/055.png b/pages-images/055.png
new file mode 100644
index 0000000..495ff42
Binary files /dev/null and b/pages-images/055.png differ
diff --git a/pages-images/056.png b/pages-images/056.png
new file mode 100644
index 0000000..669909f
Binary files /dev/null and b/pages-images/056.png differ
diff --git a/pages-images/057.png b/pages-images/057.png
new file mode 100644
index 0000000..d16cdfd
Binary files /dev/null and b/pages-images/057.png differ
diff --git a/pages-images/058.png b/pages-images/058.png
new file mode 100644
index 0000000..bf178de
Binary files /dev/null and b/pages-images/058.png differ
diff --git a/pages-images/059.png b/pages-images/059.png
new file mode 100644
index 0000000..9e1015b
Binary files /dev/null and b/pages-images/059.png differ
diff --git a/pages-images/060.png b/pages-images/060.png
new file mode 100644
index 0000000..3ec9644
Binary files /dev/null and b/pages-images/060.png differ
diff --git a/pages-images/061.png b/pages-images/061.png
new file mode 100644
index 0000000..60c2020
Binary files /dev/null and b/pages-images/061.png differ
diff --git a/pages-images/062.png b/pages-images/062.png
new file mode 100644
index 0000000..7bc8424
Binary files /dev/null and b/pages-images/062.png differ
diff --git a/pages-images/063.png b/pages-images/063.png
new file mode 100644
index 0000000..4cbb8a8
Binary files /dev/null and b/pages-images/063.png differ
diff --git a/pages-images/064.png b/pages-images/064.png
new file mode 100644
index 0000000..e3a2604
Binary files /dev/null and b/pages-images/064.png differ
diff --git a/pages-images/065.png b/pages-images/065.png
new file mode 100644
index 0000000..9b20006
Binary files /dev/null and b/pages-images/065.png differ
diff --git a/pages-images/066.png b/pages-images/066.png
new file mode 100644
index 0000000..ef60c24
Binary files /dev/null and b/pages-images/066.png differ
diff --git a/pages-images/067.png b/pages-images/067.png
new file mode 100644
index 0000000..eaa9509
Binary files /dev/null and b/pages-images/067.png differ
diff --git a/pages-images/068.png b/pages-images/068.png
new file mode 100644
index 0000000..cf4cc8c
Binary files /dev/null and b/pages-images/068.png differ
diff --git a/pages-images/069.png b/pages-images/069.png
new file mode 100644
index 0000000..692a8b2
Binary files /dev/null and b/pages-images/069.png differ
diff --git a/pages-images/070.png b/pages-images/070.png
new file mode 100644
index 0000000..f350fa9
Binary files /dev/null and b/pages-images/070.png differ
diff --git a/pages-images/071.png b/pages-images/071.png
new file mode 100644
index 0000000..23c3b3b
Binary files /dev/null and b/pages-images/071.png differ
diff --git a/pages-images/072.png b/pages-images/072.png
new file mode 100644
index 0000000..000deb6
Binary files /dev/null and b/pages-images/072.png differ
diff --git a/pages-images/073.png b/pages-images/073.png
new file mode 100644
index 0000000..c2a377c
Binary files /dev/null and b/pages-images/073.png differ
diff --git a/pages-images/074.png b/pages-images/074.png
new file mode 100644
index 0000000..d2815cc
Binary files /dev/null and b/pages-images/074.png differ
diff --git a/pages-images/075.png b/pages-images/075.png
new file mode 100644
index 0000000..48c9e4b
Binary files /dev/null and b/pages-images/075.png differ
diff --git a/pages-images/076.png b/pages-images/076.png
new file mode 100644
index 0000000..697a127
Binary files /dev/null and b/pages-images/076.png differ
diff --git a/pages-images/077.png b/pages-images/077.png
new file mode 100644
index 0000000..63a2361
Binary files /dev/null and b/pages-images/077.png differ
diff --git a/pages-images/078.png b/pages-images/078.png
new file mode 100644
index 0000000..21850e1
Binary files /dev/null and b/pages-images/078.png differ
diff --git a/pages-images/079.png b/pages-images/079.png
new file mode 100644
index 0000000..0c9de87
Binary files /dev/null and b/pages-images/079.png differ
diff --git a/pages-images/080.png b/pages-images/080.png
new file mode 100644
index 0000000..943da8a
Binary files /dev/null and b/pages-images/080.png differ
diff --git a/pages-images/081.png b/pages-images/081.png
new file mode 100644
index 0000000..fc2b77d
Binary files /dev/null and b/pages-images/081.png differ
diff --git a/pages-images/082.png b/pages-images/082.png
new file mode 100644
index 0000000..e12a76a
Binary files /dev/null and b/pages-images/082.png differ
diff --git a/pages-images/083.png b/pages-images/083.png
new file mode 100644
index 0000000..f0b0643
Binary files /dev/null and b/pages-images/083.png differ
diff --git a/pages-images/084.png b/pages-images/084.png
new file mode 100644
index 0000000..7262380
Binary files /dev/null and b/pages-images/084.png differ
diff --git a/pages-images/085.png b/pages-images/085.png
new file mode 100644
index 0000000..764cc48
Binary files /dev/null and b/pages-images/085.png differ
diff --git a/pages-images/086.jpg b/pages-images/086.jpg
new file mode 100644
index 0000000..eb3266f
Binary files /dev/null and b/pages-images/086.jpg differ
diff --git a/pages-images/087.png b/pages-images/087.png
new file mode 100644
index 0000000..bf06067
Binary files /dev/null and b/pages-images/087.png differ
diff --git a/pages-images/088.png b/pages-images/088.png
new file mode 100644
index 0000000..1d9a772
Binary files /dev/null and b/pages-images/088.png differ
diff --git a/pages-images/089.png b/pages-images/089.png
new file mode 100644
index 0000000..bdfd631
Binary files /dev/null and b/pages-images/089.png differ
diff --git a/pages-images/090.png b/pages-images/090.png
new file mode 100644
index 0000000..2593c6d
Binary files /dev/null and b/pages-images/090.png differ
diff --git a/pages-images/091.png b/pages-images/091.png
new file mode 100644
index 0000000..48eb17b
Binary files /dev/null and b/pages-images/091.png differ
diff --git a/pages-images/092.png b/pages-images/092.png
new file mode 100644
index 0000000..8be404b
Binary files /dev/null and b/pages-images/092.png differ
diff --git a/pages-images/093.png b/pages-images/093.png
new file mode 100644
index 0000000..b50ef2c
Binary files /dev/null and b/pages-images/093.png differ
diff --git a/pages-images/094.png b/pages-images/094.png
new file mode 100644
index 0000000..a97dfbf
Binary files /dev/null and b/pages-images/094.png differ
diff --git a/pages-images/095.png b/pages-images/095.png
new file mode 100644
index 0000000..fbda260
Binary files /dev/null and b/pages-images/095.png differ
diff --git a/pages-images/096.png b/pages-images/096.png
new file mode 100644
index 0000000..d9647c0
Binary files /dev/null and b/pages-images/096.png differ
diff --git a/pages-images/097.png b/pages-images/097.png
new file mode 100644
index 0000000..2a8c236
Binary files /dev/null and b/pages-images/097.png differ
diff --git a/pages-images/098.png b/pages-images/098.png
new file mode 100644
index 0000000..d12f844
Binary files /dev/null and b/pages-images/098.png differ
diff --git a/pages-images/099.png b/pages-images/099.png
new file mode 100644
index 0000000..dfbe9b3
Binary files /dev/null and b/pages-images/099.png differ
diff --git a/pages-images/100.png b/pages-images/100.png
new file mode 100644
index 0000000..d932508
Binary files /dev/null and b/pages-images/100.png differ
diff --git a/pages-images/101.png b/pages-images/101.png
new file mode 100644
index 0000000..5691e10
Binary files /dev/null and b/pages-images/101.png differ
diff --git a/pages-images/102.png b/pages-images/102.png
new file mode 100644
index 0000000..dad74c0
Binary files /dev/null and b/pages-images/102.png differ
diff --git a/pages-images/103.png b/pages-images/103.png
new file mode 100644
index 0000000..d97e9b7
Binary files /dev/null and b/pages-images/103.png differ
diff --git a/pages-images/104.png b/pages-images/104.png
new file mode 100644
index 0000000..e53ed9e
Binary files /dev/null and b/pages-images/104.png differ
diff --git a/pages-images/105.png b/pages-images/105.png
new file mode 100644
index 0000000..a7c746c
Binary files /dev/null and b/pages-images/105.png differ
diff --git a/pages-images/106.png b/pages-images/106.png
new file mode 100644
index 0000000..97ba2e7
Binary files /dev/null and b/pages-images/106.png differ
diff --git a/pages-images/107.png b/pages-images/107.png
new file mode 100644
index 0000000..c9419b4
Binary files /dev/null and b/pages-images/107.png differ
diff --git a/pages-images/108.png b/pages-images/108.png
new file mode 100644
index 0000000..b1beaf5
Binary files /dev/null and b/pages-images/108.png differ
diff --git a/pages-images/109.png b/pages-images/109.png
new file mode 100644
index 0000000..0866a5a
Binary files /dev/null and b/pages-images/109.png differ
diff --git a/pages-images/110.png b/pages-images/110.png
new file mode 100644
index 0000000..88ec7db
Binary files /dev/null and b/pages-images/110.png differ
diff --git a/pages-images/111.png b/pages-images/111.png
new file mode 100644
index 0000000..5ff428f
Binary files /dev/null and b/pages-images/111.png differ
diff --git a/pages-images/112.png b/pages-images/112.png
new file mode 100644
index 0000000..5cd2e9d
Binary files /dev/null and b/pages-images/112.png differ
diff --git a/pages-images/113.png b/pages-images/113.png
new file mode 100644
index 0000000..c4a562d
Binary files /dev/null and b/pages-images/113.png differ
diff --git a/pages-images/114.png b/pages-images/114.png
new file mode 100644
index 0000000..d07295d
Binary files /dev/null and b/pages-images/114.png differ
diff --git a/pages-images/115.png b/pages-images/115.png
new file mode 100644
index 0000000..2173813
Binary files /dev/null and b/pages-images/115.png differ
diff --git a/pages-images/116.png b/pages-images/116.png
new file mode 100644
index 0000000..3c4ac73
Binary files /dev/null and b/pages-images/116.png differ
diff --git a/pages-images/117.png b/pages-images/117.png
new file mode 100644
index 0000000..8037e48
Binary files /dev/null and b/pages-images/117.png differ
diff --git a/pages-images/118.png b/pages-images/118.png
new file mode 100644
index 0000000..6d4f095
Binary files /dev/null and b/pages-images/118.png differ
diff --git a/pages-images/119.png b/pages-images/119.png
new file mode 100644
index 0000000..fb1d950
Binary files /dev/null and b/pages-images/119.png differ
diff --git a/pages-images/120.png b/pages-images/120.png
new file mode 100644
index 0000000..4db03f5
Binary files /dev/null and b/pages-images/120.png differ
diff --git a/pages-images/121.png b/pages-images/121.png
new file mode 100644
index 0000000..a06bc6e
Binary files /dev/null and b/pages-images/121.png differ
diff --git a/pages-images/122.png b/pages-images/122.png
new file mode 100644
index 0000000..3ab76ca
Binary files /dev/null and b/pages-images/122.png differ
diff --git a/pages-images/123.png b/pages-images/123.png
new file mode 100644
index 0000000..5c9d8e4
Binary files /dev/null and b/pages-images/123.png differ
diff --git a/pages-images/124.png b/pages-images/124.png
new file mode 100644
index 0000000..d9feb27
Binary files /dev/null and b/pages-images/124.png differ
diff --git a/pages-images/125.png b/pages-images/125.png
new file mode 100644
index 0000000..f0d9362
Binary files /dev/null and b/pages-images/125.png differ
diff --git a/pages-images/126.png b/pages-images/126.png
new file mode 100644
index 0000000..2f5744e
Binary files /dev/null and b/pages-images/126.png differ
diff --git a/pages-images/127.jpg b/pages-images/127.jpg
new file mode 100644
index 0000000..4ead357
Binary files /dev/null and b/pages-images/127.jpg differ
diff --git a/pages-images/128.png b/pages-images/128.png
new file mode 100644
index 0000000..4c65b6c
Binary files /dev/null and b/pages-images/128.png differ
diff --git a/pages-images/129.png b/pages-images/129.png
new file mode 100644
index 0000000..2e6f298
Binary files /dev/null and b/pages-images/129.png differ
diff --git a/pages-images/130.png b/pages-images/130.png
new file mode 100644
index 0000000..4402612
Binary files /dev/null and b/pages-images/130.png differ
diff --git a/pages-images/131.png b/pages-images/131.png
new file mode 100644
index 0000000..90b2bca
Binary files /dev/null and b/pages-images/131.png differ
diff --git a/pages-images/132.png b/pages-images/132.png
new file mode 100644
index 0000000..1ab98b9
Binary files /dev/null and b/pages-images/132.png differ
diff --git a/pages-images/133.png b/pages-images/133.png
new file mode 100644
index 0000000..f5450d8
Binary files /dev/null and b/pages-images/133.png differ
diff --git a/pages-images/134.png b/pages-images/134.png
new file mode 100644
index 0000000..35313da
Binary files /dev/null and b/pages-images/134.png differ
diff --git a/pages-images/135.png b/pages-images/135.png
new file mode 100644
index 0000000..b312459
Binary files /dev/null and b/pages-images/135.png differ
diff --git a/pages-images/136.png b/pages-images/136.png
new file mode 100644
index 0000000..15fd758
Binary files /dev/null and b/pages-images/136.png differ
diff --git a/pages-images/137.png b/pages-images/137.png
new file mode 100644
index 0000000..36a3fca
Binary files /dev/null and b/pages-images/137.png differ
diff --git a/pages-images/138.png b/pages-images/138.png
new file mode 100644
index 0000000..ce8c8a5
Binary files /dev/null and b/pages-images/138.png differ
diff --git a/pages-images/139.png b/pages-images/139.png
new file mode 100644
index 0000000..17ff4e5
Binary files /dev/null and b/pages-images/139.png differ
diff --git a/pages-images/140.png b/pages-images/140.png
new file mode 100644
index 0000000..e7ab70d
Binary files /dev/null and b/pages-images/140.png differ
diff --git a/pages-images/141.png b/pages-images/141.png
new file mode 100644
index 0000000..9cfba27
Binary files /dev/null and b/pages-images/141.png differ
diff --git a/pages-images/142.png b/pages-images/142.png
new file mode 100644
index 0000000..f434b9e
Binary files /dev/null and b/pages-images/142.png differ
diff --git a/pages-images/143.png b/pages-images/143.png
new file mode 100644
index 0000000..d354b71
Binary files /dev/null and b/pages-images/143.png differ
diff --git a/pages-images/144.png b/pages-images/144.png
new file mode 100644
index 0000000..6231d53
Binary files /dev/null and b/pages-images/144.png differ
diff --git a/pages-images/145.png b/pages-images/145.png
new file mode 100644
index 0000000..7c680be
Binary files /dev/null and b/pages-images/145.png differ
diff --git a/pages-images/146.png b/pages-images/146.png
new file mode 100644
index 0000000..3409ca9
Binary files /dev/null and b/pages-images/146.png differ
diff --git a/pages-images/147.png b/pages-images/147.png
new file mode 100644
index 0000000..6529aa0
Binary files /dev/null and b/pages-images/147.png differ
diff --git a/pages-images/148.png b/pages-images/148.png
new file mode 100644
index 0000000..e6fdda4
Binary files /dev/null and b/pages-images/148.png differ
diff --git a/pages-images/149.png b/pages-images/149.png
new file mode 100644
index 0000000..9573d4b
Binary files /dev/null and b/pages-images/149.png differ
diff --git a/pages-images/150.png b/pages-images/150.png
new file mode 100644
index 0000000..c4be5ee
Binary files /dev/null and b/pages-images/150.png differ
diff --git a/pages-images/151.png b/pages-images/151.png
new file mode 100644
index 0000000..90c1b1a
Binary files /dev/null and b/pages-images/151.png differ
diff --git a/pages-images/152.png b/pages-images/152.png
new file mode 100644
index 0000000..4282b41
Binary files /dev/null and b/pages-images/152.png differ
diff --git a/pages-images/153.png b/pages-images/153.png
new file mode 100644
index 0000000..a2949e6
Binary files /dev/null and b/pages-images/153.png differ
diff --git a/pages-images/154.png b/pages-images/154.png
new file mode 100644
index 0000000..9a8a726
Binary files /dev/null and b/pages-images/154.png differ
diff --git a/pages-images/155.png b/pages-images/155.png
new file mode 100644
index 0000000..96d8dd6
Binary files /dev/null and b/pages-images/155.png differ
diff --git a/pages-images/156.png b/pages-images/156.png
new file mode 100644
index 0000000..7b4cd8d
Binary files /dev/null and b/pages-images/156.png differ
diff --git a/pages-images/157.png b/pages-images/157.png
new file mode 100644
index 0000000..7f3889d
Binary files /dev/null and b/pages-images/157.png differ
diff --git a/pages-images/158.png b/pages-images/158.png
new file mode 100644
index 0000000..567ab7d
Binary files /dev/null and b/pages-images/158.png differ
diff --git a/pages-images/159.png b/pages-images/159.png
new file mode 100644
index 0000000..975701e
Binary files /dev/null and b/pages-images/159.png differ
diff --git a/pages-images/160.png b/pages-images/160.png
new file mode 100644
index 0000000..91fbf37
Binary files /dev/null and b/pages-images/160.png differ
diff --git a/pages-images/161.png b/pages-images/161.png
new file mode 100644
index 0000000..3757141
Binary files /dev/null and b/pages-images/161.png differ
diff --git a/pages-images/162.png b/pages-images/162.png
new file mode 100644
index 0000000..f3b2aa8
Binary files /dev/null and b/pages-images/162.png differ
diff --git a/pages-images/163.png b/pages-images/163.png
new file mode 100644
index 0000000..4bb9a38
Binary files /dev/null and b/pages-images/163.png differ
diff --git a/pages-images/164.png b/pages-images/164.png
new file mode 100644
index 0000000..f0c78dd
Binary files /dev/null and b/pages-images/164.png differ
diff --git a/pages-images/165.png b/pages-images/165.png
new file mode 100644
index 0000000..9c1b254
Binary files /dev/null and b/pages-images/165.png differ
diff --git a/pages-images/166.png b/pages-images/166.png
new file mode 100644
index 0000000..a06429f
Binary files /dev/null and b/pages-images/166.png differ
diff --git a/pages-images/167.png b/pages-images/167.png
new file mode 100644
index 0000000..cf38eff
Binary files /dev/null and b/pages-images/167.png differ
diff --git a/pages-images/168.png b/pages-images/168.png
new file mode 100644
index 0000000..366d6f1
Binary files /dev/null and b/pages-images/168.png differ
diff --git a/pages-images/169.png b/pages-images/169.png
new file mode 100644
index 0000000..16b6881
Binary files /dev/null and b/pages-images/169.png differ
diff --git a/pages-images/170.png b/pages-images/170.png
new file mode 100644
index 0000000..ec10b69
Binary files /dev/null and b/pages-images/170.png differ
diff --git a/pages-images/171.png b/pages-images/171.png
new file mode 100644
index 0000000..a6c4e77
Binary files /dev/null and b/pages-images/171.png differ
diff --git a/pages-images/172.png b/pages-images/172.png
new file mode 100644
index 0000000..3eabadb
Binary files /dev/null and b/pages-images/172.png differ
diff --git a/pages-images/173.png b/pages-images/173.png
new file mode 100644
index 0000000..6604215
Binary files /dev/null and b/pages-images/173.png differ
diff --git a/pages-images/174.png b/pages-images/174.png
new file mode 100644
index 0000000..d280565
Binary files /dev/null and b/pages-images/174.png differ
diff --git a/pages-images/175.png b/pages-images/175.png
new file mode 100644
index 0000000..ce38ec7
Binary files /dev/null and b/pages-images/175.png differ
diff --git a/pages-images/176.png b/pages-images/176.png
new file mode 100644
index 0000000..3200bab
Binary files /dev/null and b/pages-images/176.png differ
diff --git a/pages-images/177.png b/pages-images/177.png
new file mode 100644
index 0000000..edacb51
Binary files /dev/null and b/pages-images/177.png differ
diff --git a/pages-images/178.png b/pages-images/178.png
new file mode 100644
index 0000000..fde43fe
Binary files /dev/null and b/pages-images/178.png differ
diff --git a/pages-images/179.png b/pages-images/179.png
new file mode 100644
index 0000000..af889b6
Binary files /dev/null and b/pages-images/179.png differ
diff --git a/pages-images/180.png b/pages-images/180.png
new file mode 100644
index 0000000..8c59627
Binary files /dev/null and b/pages-images/180.png differ
diff --git a/pages-images/181.png b/pages-images/181.png
new file mode 100644
index 0000000..5f2410a
Binary files /dev/null and b/pages-images/181.png differ
diff --git a/pages-images/182.png b/pages-images/182.png
new file mode 100644
index 0000000..0c571ce
Binary files /dev/null and b/pages-images/182.png differ
diff --git a/pages-images/183.png b/pages-images/183.png
new file mode 100644
index 0000000..a72b74a
Binary files /dev/null and b/pages-images/183.png differ
diff --git a/pages-images/184.png b/pages-images/184.png
new file mode 100644
index 0000000..c3eaefb
Binary files /dev/null and b/pages-images/184.png differ
diff --git a/pages-images/185.png b/pages-images/185.png
new file mode 100644
index 0000000..59f701e
Binary files /dev/null and b/pages-images/185.png differ
diff --git a/pages-images/186.png b/pages-images/186.png
new file mode 100644
index 0000000..bef6f29
Binary files /dev/null and b/pages-images/186.png differ
diff --git a/pages-images/187.png b/pages-images/187.png
new file mode 100644
index 0000000..00b431c
Binary files /dev/null and b/pages-images/187.png differ
diff --git a/pages-images/188.png b/pages-images/188.png
new file mode 100644
index 0000000..a923b89
Binary files /dev/null and b/pages-images/188.png differ
diff --git a/pages-images/189.png b/pages-images/189.png
new file mode 100644
index 0000000..4fb55a3
Binary files /dev/null and b/pages-images/189.png differ
diff --git a/pages-images/190.png b/pages-images/190.png
new file mode 100644
index 0000000..262fedb
Binary files /dev/null and b/pages-images/190.png differ
diff --git a/pages-images/191.png b/pages-images/191.png
new file mode 100644
index 0000000..6317902
Binary files /dev/null and b/pages-images/191.png differ
diff --git a/pages-images/192.png b/pages-images/192.png
new file mode 100644
index 0000000..48b1347
Binary files /dev/null and b/pages-images/192.png differ
diff --git a/pages-images/193.png b/pages-images/193.png
new file mode 100644
index 0000000..dc6dc65
Binary files /dev/null and b/pages-images/193.png differ
diff --git a/pages-images/194.png b/pages-images/194.png
new file mode 100644
index 0000000..3a86f44
Binary files /dev/null and b/pages-images/194.png differ
diff --git a/pages-images/195.png b/pages-images/195.png
new file mode 100644
index 0000000..d2a00a9
Binary files /dev/null and b/pages-images/195.png differ
diff --git a/pages-images/196.png b/pages-images/196.png
new file mode 100644
index 0000000..1fea2ff
Binary files /dev/null and b/pages-images/196.png differ
diff --git a/pages-images/197.png b/pages-images/197.png
new file mode 100644
index 0000000..cf5a445
Binary files /dev/null and b/pages-images/197.png differ
diff --git a/pages-images/198.png b/pages-images/198.png
new file mode 100644
index 0000000..59f33f9
Binary files /dev/null and b/pages-images/198.png differ
diff --git a/pages-images/199.png b/pages-images/199.png
new file mode 100644
index 0000000..20fe0dd
Binary files /dev/null and b/pages-images/199.png differ
diff --git a/pages-images/200.png b/pages-images/200.png
new file mode 100644
index 0000000..a4cbfc5
Binary files /dev/null and b/pages-images/200.png differ
diff --git a/pages-images/201.png b/pages-images/201.png
new file mode 100644
index 0000000..da3bca7
Binary files /dev/null and b/pages-images/201.png differ
diff --git a/pages-images/202.png b/pages-images/202.png
new file mode 100644
index 0000000..c06cde7
Binary files /dev/null and b/pages-images/202.png differ
diff --git a/pages-images/203.png b/pages-images/203.png
new file mode 100644
index 0000000..278c710
Binary files /dev/null and b/pages-images/203.png differ
diff --git a/pages-images/204.png b/pages-images/204.png
new file mode 100644
index 0000000..42f3594
Binary files /dev/null and b/pages-images/204.png differ
diff --git a/pages-images/205.png b/pages-images/205.png
new file mode 100644
index 0000000..6cf7d9b
Binary files /dev/null and b/pages-images/205.png differ
diff --git a/pages-images/206.png b/pages-images/206.png
new file mode 100644
index 0000000..cbdaac7
Binary files /dev/null and b/pages-images/206.png differ
diff --git a/pages-images/207.png b/pages-images/207.png
new file mode 100644
index 0000000..d937761
Binary files /dev/null and b/pages-images/207.png differ
diff --git a/pages-images/208.png b/pages-images/208.png
new file mode 100644
index 0000000..1b2a9a3
Binary files /dev/null and b/pages-images/208.png differ
diff --git a/pages-images/209.png b/pages-images/209.png
new file mode 100644
index 0000000..638fb87
Binary files /dev/null and b/pages-images/209.png differ
diff --git a/pages-images/210.png b/pages-images/210.png
new file mode 100644
index 0000000..516b394
Binary files /dev/null and b/pages-images/210.png differ
diff --git a/pages-images/211.png b/pages-images/211.png
new file mode 100644
index 0000000..1305aa5
Binary files /dev/null and b/pages-images/211.png differ
diff --git a/pages-images/212.png b/pages-images/212.png
new file mode 100644
index 0000000..ade4a08
Binary files /dev/null and b/pages-images/212.png differ
diff --git a/pages-images/213.png b/pages-images/213.png
new file mode 100644
index 0000000..f41561f
Binary files /dev/null and b/pages-images/213.png differ
diff --git a/pages-images/214.png b/pages-images/214.png
new file mode 100644
index 0000000..82de924
Binary files /dev/null and b/pages-images/214.png differ
diff --git a/pages-images/215.png b/pages-images/215.png
new file mode 100644
index 0000000..ed76c2b
Binary files /dev/null and b/pages-images/215.png differ
diff --git a/pages-images/216.png b/pages-images/216.png
new file mode 100644
index 0000000..a1ef5a7
Binary files /dev/null and b/pages-images/216.png differ
diff --git a/pages-images/217.png b/pages-images/217.png
new file mode 100644
index 0000000..cc4f47e
Binary files /dev/null and b/pages-images/217.png differ
diff --git a/pages-images/218.png b/pages-images/218.png
new file mode 100644
index 0000000..393a29d
Binary files /dev/null and b/pages-images/218.png differ
diff --git a/pages-images/219.png b/pages-images/219.png
new file mode 100644
index 0000000..f30a321
Binary files /dev/null and b/pages-images/219.png differ
diff --git a/pages-images/220.png b/pages-images/220.png
new file mode 100644
index 0000000..922f8df
Binary files /dev/null and b/pages-images/220.png differ
diff --git a/pages-images/221.png b/pages-images/221.png
new file mode 100644
index 0000000..81e275f
Binary files /dev/null and b/pages-images/221.png differ
diff --git a/pages-images/222.png b/pages-images/222.png
new file mode 100644
index 0000000..9d2c5a0
Binary files /dev/null and b/pages-images/222.png differ
diff --git a/pages-images/223.png b/pages-images/223.png
new file mode 100644
index 0000000..8d96804
Binary files /dev/null and b/pages-images/223.png differ
diff --git a/pages-images/224.png b/pages-images/224.png
new file mode 100644
index 0000000..57e0e1c
Binary files /dev/null and b/pages-images/224.png differ
diff --git a/pages-images/225.png b/pages-images/225.png
new file mode 100644
index 0000000..38eacd9
Binary files /dev/null and b/pages-images/225.png differ
diff --git a/pages-images/226.png b/pages-images/226.png
new file mode 100644
index 0000000..c33ec85
Binary files /dev/null and b/pages-images/226.png differ
diff --git a/pages-images/227.png b/pages-images/227.png
new file mode 100644
index 0000000..d189b0c
Binary files /dev/null and b/pages-images/227.png differ
diff --git a/pages-images/228.png b/pages-images/228.png
new file mode 100644
index 0000000..d194e2a
Binary files /dev/null and b/pages-images/228.png differ
diff --git a/pages-images/229.jpg b/pages-images/229.jpg
new file mode 100644
index 0000000..d51a336
Binary files /dev/null and b/pages-images/229.jpg differ
diff --git a/pages-txt/000.txt b/pages-txt/000.txt
new file mode 100644
index 0000000..cd169be
--- /dev/null
+++ b/pages-txt/000.txt
@@ -0,0 +1,17 @@
+Cambridge Studies
+
+in Speech Science
+and Communication
+
+From text
+to speech:
+The MITalk
+system
+
+Jonathan Allen,
+M. Sharon Hunnicutt
+and Dennis Klatt
+
+With Robert C. Armstrong
+and David B.Pisoni
+
diff --git a/pages-txt/001.txt b/pages-txt/001.txt
new file mode 100644
index 0000000..1c78a6b
--- /dev/null
+++ b/pages-txt/001.txt
@@ -0,0 +1,7 @@
+Cambridge Studies in Speech Science and Communication
+
+Advisory Editorial Board J. Laver (Executive editor) A. J. Fourcin J. Gilbert
+M. Haggard P. Ladefoged B. Lindblom J. C. Marshall
+
+From text to speech
+The MITalk system
diff --git a/pages-txt/002.txt b/pages-txt/002.txt
new file mode 100644
index 0000000..98c8ff2
--- /dev/null
+++ b/pages-txt/002.txt
@@ -0,0 +1,6 @@
+In this series:
+
+The phonetic bases of speaker recognition Francis Nolan
+
+Patterns of sounds Ian Maddieson
+Neurolinguistics and linguistic aphasiology David Caplan
diff --git a/pages-txt/003.txt b/pages-txt/003.txt
new file mode 100644
index 0000000..a5ae8b6
--- /dev/null
+++ b/pages-txt/003.txt
@@ -0,0 +1,25 @@
+From text to speech
+The MITalk system
+
+Jonathan Allen,
+M. Sharon Hunnicutt and
+Dennis Klatt
+
+With Robert C. Armstrong and
+David Pisoni
+
+The right of the
+University of Cambridge
+to print and sell
+all manner of books
+was granted by
+Henry VI in 1534.
+The University has printed
+
+and published continuously
+
+Cambridge University Press
+
+Cambridge
+London New York New Rochelle
+Melbourne Sydney
diff --git a/pages-txt/004.txt b/pages-txt/004.txt
new file mode 100644
index 0000000..6aaca44
--- /dev/null
+++ b/pages-txt/004.txt
@@ -0,0 +1,40 @@
+Published by the Press Syndicate of the University of Cambridge
+The Pitt Building, Trumpington Street, Cambridge CB2 1RP
+
+32 East 57th Street, New York, NY 10022, USA
+
+10 Stamford Road, Oakleigh, Melbourne 3166, Australia
+
+© Cambridge University Press 1987
+First published 1987
+
+Printed in Great Britain at the University Press, Cambridge
+
+British Library cataloguing in publication data
+
+From Text to speech : MITalk system. -
+
+(Cambridge studies in speech science and communication)
+1. Automatic speech recognition
+
+I. Allen, Jonathan, 1934-
+
+II. Hunnicutt, M. Sharon. III. Klatt, Dennis H.
+621.3819'598 TK7882.565
+
+Library of Congress cataloguing in publication data
+
+Allen, Jonathan, 1934-
+
+From text to speech.
+
+(Cambridge studies in speech science and communication)
+1. Speech processing systems. 2. Speech synthesis.
+
+I. Hunnicutt, M. Sharon. II. Klatt, Dennis H.
+
+III. Title. IV. Series.
+
+TK7882.S65A45 1986 006.5 85-24280
+
+ISBN 0 521 30641 8
diff --git a/pages-txt/005.txt b/pages-txt/005.txt
new file mode 100644
index 0000000..dc4530a
--- /dev/null
+++ b/pages-txt/005.txt
@@ -0,0 +1,88 @@
+Contents
+
+List of contributors
+Preface
+
+1 Introduction
+1.1 Constraints on speech synthesis
+1.2 Synthesis techniques
+1.3 Functional outline of MITalk
+I
+
+Analysis
+
+2 Text preprocessing
+2.1 Overview
+2.2 Input
+
+2.3 Output
+2.4 Formatting operations
+
+3 Morphological analysis
+3.1 Overview
+3.2 Input
+3.3 Output
+3.4 The algorithm
+3.5 An example of a decomposition
+3.6 The lexicon
+
+4 The phrase-level parser
+4.1 Overview
+4.2 Input
+4.3 Output
+4.4 Parts of speech
+4.5 The part-of-speech processor
+4.6 The parser algorithm
+4.7 Some examples
+
+5 Morphophonemics and stress adjustment
+5.1 Overview
+5.2 Input
+5.3 Output
+5.4 Morphophonemic rules
+5.5 Stress modification rules
+5.6 Anexample
+
+6 Letter-to-sound and lexical stress
+6.1 Overview
+6.2 Letter-to-sound
+6.3 Lexical stress placement
+6.4 Anexample
+
+.
+
+[\ IV JENG RENG SR =
+
+16
+17
+18
+18
+
+23
+23
+27
+27
+28
+35
+36
+
+41
+41
+41
+43
+45
+51
+
+52
+52
+52
+52
+52
+54
+54
+
+57
+37
+57
+61
+69
diff --git a/pages-txt/006.txt b/pages-txt/006.txt
new file mode 100644
index 0000000..4d11744
--- /dev/null
+++ b/pages-txt/006.txt
@@ -0,0 +1,110 @@
+Contents
+
+II Synthesis
+
+7 Survey of speech synthesis technology
+7.1 Overview
+7.2 Background
+7.3 Synthesis techniques
+7.4 Applications
+
+8 The phonological component
+8.1 Overview
+8.2 Input representation for a sentence
+8.3 Comparison between ideal synthesis input and system performance
+8.4 Stress rules
+8.5 Rules of segmental phonology
+8.6 Pauses
+8.7 Evaluation of the analysis modules
+
+9 The prosodic component
+9.1 Overview
+9.2 Segmental durations
+
+10 The fundamental frequency generator
+10.1 Overview
+10.2 Input
+10.3 OQutput
+10.4 The O’Shaughnessy fundamental frequency algorithm
+10.5 Adjustments to the O’Shaughnessy algorithm
+10.6 Potential improvements from additional syntactic information
+
+11 The phonetic component
+11.1 Overview
+11.2 “Synthesis-by-analysis” of consonant-vowel syllables
+11.3 General rules for the synthesis of phonetic sequences
+11.4 Summary
+
+12 The Klatt formant synthesizer
+12.1 Overview
+12.2 Vocal tract transfer functions
+12.3 Radiation characteristic
+
+13 Some measures of intelligibility and comprehension
+13.1 Overview
+13.2 Phoneme recognition
+13.3 Word recognition in sentences
+13.4 Comprehension
+13.5 General discussion and conclusions
+
+14 Implementation
+14.1 Conceptual organization
+14.2 Development system
+14.3 Performance system
+14.4 UNIX implementation
+14.5 Using the system
+
+vi
+
+71
+71
+72
+73
+79
+
+81
+81
+81
+85
+86
+87
+88
+89
+
+93
+93
+93
+
+100
+100
+101
+102
+103
+107
+107
+
+108
+108
+109
+116
+122
+
+123
+123
+139
+150
+
+151
+151
+152
+157
+161
+
+167
+
+172
+172
+173
+174
+174
+175
diff --git a/pages-txt/007.txt b/pages-txt/007.txt
new file mode 100644
index 0000000..bcca41b
--- /dev/null
+++ b/pages-txt/007.txt
@@ -0,0 +1,16 @@
+Contents
+
+Appendixes
+A Part-of-speech processor 177
+B Klatt symbols 179
+C Context-dependent rules for PHONET 181
+D Sample test trials from the Modified Rhyme Test 202
+E Sample test materials from the Harvard Psychoacoustic Sentences 203
+F Sample test materials from the Haskins Anomalous Sentences 204
+G Sample passage used to test listening comprehension 205
+
+References 207
+
+Index 215
+
+vil
diff --git a/pages-txt/008.txt b/pages-txt/008.txt
new file mode 100644
index 0000000..f794383
--- /dev/null
+++ b/pages-txt/008.txt
@@ -0,0 +1,154 @@
+List of figures
+
+Example of FORMAT processing
+
+State transition diagram for the morph sequence FSM
+
+Decomposition of “scarcity”
+
+Noun group ATN listing
+
+Verb group ATN listing
+
+ATN diagram for verb groups
+
+ATN diagram for noun groups
+
+Example of PARSER operation
+
+Input to and output from SOUND1
+
+Suffix detection in the word finishing
+
+Application of letter-to-sound rules to caribou
+
+Application of letter-to-sound rules to subversion
+
+Example of letter-to-sound and stress rule operation
+
+Synthesis blocks of the MITalk system
+
+An example of the differences between words spoken in isolation and
+
+words spoken as a continuous utterance
+
+Example of PHONO1 and PHONO2 processing
+
+Example of the processing performed by PROSOD
+
+Example of FO contours
+
+Spectrum analysis of a speech waveform
+
+First and second formant motions in English vowels
+
+Linear prediction of plosive bursts before vowels
+
+11-4 Frequency of the lowest three formants measured at voicing onset for
+syllables involving BB, DD, and GG
+
+11-5 Synthesis strategy for a CV syllable
+
+11-6 Templates for smoothing adjacent phonetic segment targets
+
+11-7 Constants used to specify the inherent formant and durational
+characteristics of a sonorant
+
+12-1 Interface between synthesizer software and hardware
+
+12-2 Components of the output spectrum of a speech sound
+
+12-3 Parallel and cascade simulation of the vocal tract transfer function
+
+12-4 Cascade/parallel configurations supported by MITalk
+
+12-5 Block diagram and frequency response of a digital resonator
+
+12-6 Block diagram of the cascade/parallel formant synthesizer
+
+12-7 Four periods from voicing waveforms
+
+12-8 Waveform segment and magnitude spectrum of frication noise
+
+12-9 Magnitude of the vocal tract transfer function
+
+12-10 Nasalization of the vowel IH in the syllable “dim”
+
+12-11 Effect of parameter changes on the vocal tract transfer function
+
+12-12 Preemphasized output spectra from cascade and parallel models
+
+12-13 Spectra from two different parallel synthesis configurations
+
+12-14 Transfer function of the radiation characteristic
+
+13-1 Average percent errors across various manner classes
+
+13-2 Distribution of errors and most frequent perceptual confusions
+
+PP PP
+DO = U DD = et A B U DD et DN bt
+
+]
+ek ek
+
+——\D 00 \ITIO\O\O\O\M
+
+HCP
+pand  puack
+
+o—
+R
+[\®)
+
+[y
+[y
+(¥
+
+viii
+
+18
+31
+37
+47
+48
+49
+50
+51
+55
+58
+60
+60
+69
+72
+
+74
+82
+94
+105
+111
+112
+113
+
+114
+
+115
+117
+
+120
+123
+125
+126
+127
+129
+131
+135
+137
+141
+143
+146
+148
+149
+150
+154
+155
diff --git a/pages-txt/009.txt b/pages-txt/009.txt
new file mode 100644
index 0000000..ab3cc8f
--- /dev/null
+++ b/pages-txt/009.txt
@@ -0,0 +1,9 @@
+List of figures
+
+13-3  Percent correct comprehension scores for reading and listening groups 165
+14-1 Sample MITalk session 176
+
+C-1 = Pre-aspiration parameter smoothing 189
+C-2  Diphthong transition smoothing 194
+
+1X
diff --git a/pages-txt/010.txt b/pages-txt/010.txt
new file mode 100644
index 0000000..4c69d4c
--- /dev/null
+++ b/pages-txt/010.txt
@@ -0,0 +1,22 @@
+List of tables
+
+2-1  Abbreviation translations performed by FORMAT 19
+3-1  Morph spelling change rules for vocalic suffixes 36
+8-1  Klatt symbols used in the synthesis modules 84
+9-1 Minimum and inherent durations in msec for each segment type 96
+10-1 Relative peak levels of words according to their parts of speech 101
+11-1 Parameter values for the synthesis of selected vowels 119
+11-2 Parameter values for the synthesis of selected components of English
+consonants before front vowels 121
+11-3  Variable control parameters specified in PHONET 122
+12-1 List of control parameters for the software formant synthesizer 132
+13-1 Characteristics of the passages used to measure comprehension 163
+B-1 Klatt symbols for phonetic segments 179
+B-2  Klatt symbols for nonsegmental units 180
+C-1  Parameter targets for nonvocalic segments 186
+C-2  Parameter targets for vocalic segments 187
+C-3  Default values for duration of forward smoothing (Tcf) 188
+C-4  Default values for Bper 188
+C-5 Diphthong transition parameters 194
+C-6  Duration of forward smoothing for obstruents (Tcobst) 196
+C-7  Default plosive burst duration 197
diff --git a/pages-txt/011.txt b/pages-txt/011.txt
new file mode 100644
index 0000000..fa1fec4
--- /dev/null
+++ b/pages-txt/011.txt
@@ -0,0 +1,31 @@
+List of contributors
+
+Jonathan Allen.
+Professor of Electrical Engineering and Computer Science, and Director
+of the Research Laboratory of Electronics, Massachusetts Institute of
+
+Technology, Cambridge, Massachusetts.
+
+M. Sharon Hunnicutt.
+Speech Transmission Laboratory, Department of Speech Communication
+and Music Acoustics, Royal Institute of Technology, Stockholm,
+
+Sweden.
+
+Dennis H. Klatt.
+Senior Research Scientist, Department of Electrical Engineering and
+
+Computer Science, and Research Laboratory of Electronics, Massachu-
+setts Institute of Technology, Cambridge, Massachusetts.
+
+Robert C. Armstrong.
+Department of Electrical Engineering and Computer Science, and
+Research Laboratory of Electronics, Massachusetts Institute of Technol-
+ogy, Cambridge, Massachusetts.
+
+David B. Pisoni.
+Professor of Psychology, Speech Research Laboratory, Department of
+
+Psychology, Indiana University, Bloomington, Indiana.
+
+X1
diff --git a/pages-txt/012.txt b/pages-txt/012.txt
new file mode 100644
index 0000000..e69de29
diff --git a/pages-txt/013.txt b/pages-txt/013.txt
new file mode 100644
index 0000000..1d4b602
--- /dev/null
+++ b/pages-txt/013.txt
@@ -0,0 +1,37 @@
+Preface
+
+The MITalk system described in this book is the result of a long effort, stretching
+from the early 1960s to the present. In this preface, a view is given of the work’s
+historical evolution. Within this description, acknowledgements are made of the
+project’s many contributions. In recognizing these contributions, it is best to or-
+ganize them into four groups. First, there is the development of the MITalk system
+itself, its evolution, and the many diverse contributions made to its structure and
+content. Second, there was the 1979 summer course which resulted in a com-
+prehensive summary of the work to that date, and also provided the occasion to
+write a set of course notes. Next, there have been continuing efforts (since 1980)
+which included re-writes of the system’s software, and the efforts to organize this
+book which involved substantial new writing and rule formulations, and explicit
+examples directly keyed to the current working system. Finally, there is the spon-
+sorship of the program’s many facets over the years.
+
+In the early 1960s, much interest in speech synthesis emerged within the Cog-
+nitive Information Processing Group at MIT’s Research Laboratory of Electronics.
+This group, led by M. Eden and S. J. Mason, focused on the development of sen-
+sory aids for the blind. Many approaches were taken, but it was recognized that
+the development of a reading machine for the blind that could scan printed text and
+produce spoken output was a major goal. Research efforts in both character recog-
+nition and speech synthesis from text were initiated. By 1968, a functional reading
+machine was demonstrated. Once the characters were recognized (using a contour
+scanning algorithm), text-to-speech conversion was accomplished in two phases.
+First, a morph decomposition analysis of all words was performed by using tech-
+niques developed by F. F. Lee (in his 1965 doctoral thesis). A morph lexicon suf-
+ficient for these demonstrations was developed. It was anticipated that any excep-
+tional words not analyzed into morphs would be pronounced by using spelled
+speech. As a result, these words were heard as a sequence of individually
+pronounced letters. The dictionary provided names of the phonetic segments for
+each morph, and synthesis was performed using the algorithms developed and
+published by Holrhes, Mattingly, and Shearme. An analog synthesizer was used to
+
+amplifiers. The demonstration of this system was impressive, although the
+
+1
diff --git a/pages-txt/014.txt b/pages-txt/014.txt
new file mode 100644
index 0000000..6c294d3
--- /dev/null
+++ b/pages-txt/014.txt
@@ -0,0 +1,42 @@
+Preface
+
+vocabulary was restricted, and the output speech quality required extensive learn-
+ing. At that time, the computer implementation used for research consisted of a
+Digital Equipment Corporation PDP-1 used for character recognition and morph
+analysis, which was coupled to MIT Lincoln Laboratory’s TX-O computer (the
+only one of its kind) for the synthesis algorithms. T. P. Barnwell III and
+E. R. Jensen were responsible for building much of this computational environ-
+ment. This required great effort and coordination, since all coding was performed
+in assembly language.
+
+Following the late 1960s, the character recognition and speech synthesis ef-
+forts continued independently of one another with the work of B. Blesser,
+M. Eden, and D. E. Troxel focused on the character recognition efforts. J. Allen
+joined the faculty in September, 1968. Goals for a fundamental and comprehen-
+sive research program aimed at the computation of high-quality speech using un-
+restricted English text as input were formulated. In addition, strong coupling con-
+tinued with the Speech Communication and Linguistics groups within the
+Research Laboratory of Electronics, led by K. N. Stevens and M. Halle, respec-
+tively.
+
+With the desire to convert unrestricted English text to speech, a new scheme
+was developed for the pronunciation of all possible English words. This required
+elaborate extensions to the morph decomposition process, as well as the construc-
+tion of a comprehensive morph lexicon to serve the entire language. Furthermore,
+spelled speech was rejected as inadequate, and plans for the development of a
+comprehensive set of letter-to-sound rules that would complement the morph
+analysis process were established. In order to build a new morph lexicon, a copy
+of the Brown corpus was obtained and sorted (shortest word first). Initial phonetic
+segment labels were obtained from a computer-readable copy of the Merriam
+Pocket Dictionary. Beginning with a nascent lexicon containing all bound morphs
+and function words, each word from the Brown corpus was successively analyzed.
+
+This led to the interactive addition of new morphs and a great deal of experience
+with morph analysis procedures. This process was accomplished by J. Allen and
+D. A. Finkel, with algorithmic and programming support from E. R. Jensen and
+F. X. Carroll. The process spanned many months, and led to the extension of
+morph analysis routines to include multiple decompositions and attendant selection
+rules. The computational support for this work was a Digital Equipment Corpora-
+tion PDP-9 computer with 24K words of memory and DEC tapes for peripheral
+storage. Readers familiar with this equipment will have some appreciation of the
+sheer magnitude of the effort required to build the morph lexicon and acquire the
diff --git a/pages-txt/015.txt b/pages-txt/015.txt
new file mode 100644
index 0000000..0fbe9d1
--- /dev/null
+++ b/pages-txt/015.txt
@@ -0,0 +1,42 @@
+Preface
+
+necessary data to support the extensions and refinements to the morph analysis
+routines. Subsequent to the initial construction of the lexicon, an elaborate editing
+of all entries was made by M. S. Hunnicutt. This led to substantial improvements
+in the system’s overall performance.
+
+When words could not be found in the morph lexicon, or could not be
+analyzed into morphs from the lexicon, letter-to-sound rules were utilized. Prior to
+the MITalk research, letter-to-sound rules had been proposed to cover the entire
+language. But, with MITalk, it was realized that high-frequency function words
+often violate perspicuous forms of these rules, and that such letter-to-sound rules
+do not span morph boundaries. Based on these observations, a complementary set
+of letter-to-sound rules could be introduced into MITalk, but these rules would not
+be used unless morph analysis failed. Realizing this fact, affix stripping was util-
+ized, and the more reliable consonants were converted first, leaving the vowels for
+last. This approach was proposed by J. Allen, and extensive sets of these rules
+were developed by M. S. Hunnicutt working with F. X. Carroll. Several sets of
+these rules were developed and elaborately tested. In addition, in the late *60s at
+MIT, there was great interest in lexical stress and phonological rules for this pur-
+pose which were initially developed by M. Halle and N. Chomsky. These rules
+were reformulated and extended to include the effect of affixes. This was the first
+time that lexical stress rules had been used in a text-to-speech system. The
+development of rules for this purpose, along with their unification with the letter-
+to-sound rules, was accomplished by M. S. Hunnicutt. In addition, the text
+preprocessing rules were also provided by M. S. Hunnicutt, as well as the routines
+for morphophonemics and stress adjustment used in conjunction with the morph
+analysis.
+
+In a 1968 doctoral thesis, J. Allen developed a parsing methodology for use in
+a text-to-speech system, with particular emphasis on the computation of necessary
+syntactic markers to specify prosodic comrelates. This parsing strategy led to the
+development of a phrase-level parser which avoided the complications of clause-
+level parsing and the problems of syntactic ambiguity at that level, but also led to
+the introduction of inaccuracy due to incomplete clause-level analyses. This ap-
+proach was augmented and extended by P. L. Miller and C. J. Drake, and was
+tested extensively in the context of the morph lexicon and analysis routines.
+
+In light of the phonetic segment labels, stress marks, and syntactic markers
+obtained by the previously mentioned programs, it was necessary to develop a
+prosodic framework for the following phonemic synthesis. A durational
+framework was developed by D. H. Klatt together with R. Carlson and
diff --git a/pages-txt/016.txt b/pages-txt/016.txt
new file mode 100644
index 0000000..e018d9a
--- /dev/null
+++ b/pages-txt/016.txt
@@ -0,0 +1,42 @@
+Preface
+
+B. Granstrom. The latter two researchers devoted a year to this project on leave
+from the Royal Institute of Technology in Stockholm, Sweden. In addition to the
+durational framework, a comprehensive investigation of fundamental frequency ef-
+fects was made by J. Allen, D. O’Shaughnessy, and A. Weibel. O’Shaughnessy’s
+doctoral thesis contains an extensive compendium of these results, and he is
+responsible for the fundamental frequency generation algorithm currently imple-
+mented in MITalk. A. Weibel contributed a characterization of fundamental fre-
+quency contours in questions.
+
+Given the prosodic framework mentioned above, phonetic segment labels,
+stress marks, and junctural marks provided by the syntactic analysis, then
+phonemic synthesis routines can be utilized to produce the output speech
+waveform. The MITalk system is based on a phonemic speech synthesis model
+developed by D. H. Klatt. All of the algorithms for the specification of the control
+parameters utilized by this model were developed by him. During the stay of
+R. Carlson and B. Granstrom, further refinements, modifications, and tests were
+performed in the context of the overall MITalk system. At that time, many issues
+concerned with consistency and the integration of the entire system were ad-
+dressed.
+
+In the late 1970s, the computational environment for the research was
+changed from the PDP-9 computer to a DEC-System 20, with output speech
+provided by a PDP-11. A special interface was constructed between the DEC-20
+and the PDP-11, and an all-digital special purpose speech synthesis processor was
+constructed by G. S. Miranker. This processor was capable of exercising the
+phonemic synthesis model in real-time. The DEC-System 20, a large time-shared
+machine, was ideally suited to the modular nature of the MITalk system. It per-
+mitted many researchers individually and interactively to build the system’s over-
+all structure. Beginning in the mid ’70s, a great deal of attention was focused on
+the MITalk system’s overall organization. The problems of coordinating such a
+large system with its many contributors cannot be overemphasized. As a result,
+standard interfaces were established between all modules. Over the years, ex-
+tremely valuable system programming contributions were provided by
+E. R. Jensen, F. X. Carroll, R. S. Goldhor, G. E. Kopec, and Y. Willems.
+
+As the entire system was built in a coordinated manner, and as experience
+with the interaction of all constituent algorithms increased, there was a clear neces-
+sity for a comprehensive evaluation of the system. Fortunately, D. Pisoni visited
+the Research Laboratory of Electronics and was attracted to the problem of percep-
+tual evaluation. He performed a broad review of the testing literature, extended
diff --git a/pages-txt/017.txt b/pages-txt/017.txt
new file mode 100644
index 0000000..87f54e3
--- /dev/null
+++ b/pages-txt/017.txt
@@ -0,0 +1,42 @@
+Preface
+
+and developed new testing methodologies, and provided a systematic assessment
+of MITalk’s output speech quality.
+
+Throughout all of the research, many important individual projects were com-
+pleted which focused on issues in speech analysis and processing, and in linguis-
+tics. The many participants in these endeavors focused individually on a variety of
+important research issues, but they also shared in the motivation provided by the
+goals of the overall system, as well as in the daily interaction with others involved
+in complementary aspects of the system. This tension between individual research
+and overall system building evolved with MITalk, and provided each contributor
+with a strong sense of satisfaction derived not only from individual efforts, but also
+from the system’s overall achievement.
+
+In the summer of 1979, it was felt that the MITalk system was at a suf-
+ficiently complete state that a specialized, intensive course devoted to its exposi-
+tion was appropriate. Accordingly, from June 25th through June 29th, a special
+short course was offered. Lectures covered all modules of the MITalk system, and
+laboratory exercises combined with demonstrations provided further contact with
+the system. The individuals involved with the course included J. Allen,
+D. H. Klatt, M. S. Hunnicutt, R. Carlson, B. Granstrom, and D. Pisoni. In ad-
+dition, a set of notes for this course was developed. M. S. Hunnicutt wrote the
+sections of the notes covering text preprocessing, morphological analysis, phrase-
+level parsing, morphophonemics and stress adjustment, letter-to-sound and lexical
+stress, and fundamental frequency contour generation. D. H. Klatt wrote the sec-
+tions on speech synthesis technology and the Klatt formant synthesizer.
+D. H. Klatt, R. Carlson, and B. Granstrom wrote the sections on the phonological
+component, the prosodic component, and the phonetic component. D. Pisoni wrote
+the section on measurement of intelligibility and comprehension directly
+reproduced as Chapter 13 of the present volume. J. Allen provided the introduc-
+tion, a section on implementation, and the summary. These notes have constituted
+the most comprehensive overview of MITalk until the publication of this book.
+
+Since 1979, the MITalk system has been available for license, and has been
+acquired by many industrial firms and universities. Bell Northern Research ac-
+quired the system for research purposes and recoded it in VAX-VMS PASCAL.
+They have kindly supplied a copy of their version to us. In turn, this version was
+converted to run under Berkeley 4.2 BSD UNIX, using the syntax of Berkeley
+PASCAL, although some routines in the new version are written in C. This latest
+version was accomplished by R. Armstrong, and it has many new features. The
+most notable feature is the overall control structure which easily permits assem-
diff --git a/pages-txt/018.txt b/pages-txt/018.txt
new file mode 100644
index 0000000..99e8ee5
--- /dev/null
+++ b/pages-txt/018.txt
@@ -0,0 +1,38 @@
+Preface
+
+bling subsets of the overall system, and the provision of a variety of displays to
+view the functions of various modules. This system has been used successfully on
+several occasions, and is described in Chapter 14.
+
+With this new UNIX version of MITalk, J. Allen and R. Armstrong have un-
+dertaken extensive writing and editing which build on the 1979 summer course
+notes in order to construct the current text. In particular, all examples are a direct
+result of the current implementation, and new rule formulations have been added to
+the text by using a generalized notation for phonological rules. These rule im-
+provements have been achieved by R. Armstrong. Several new sections have been
+added, and extensive editing has been performed along with an expanded and more
+explicit representation of the actual algorithms and rules used in the system. Thus,
+the present text is the product of the original authors of the 1979 summer course
+notes (mentioned above), plus expansion in detail, examples, and both explicit and
+extensive rule formulations added by J. Allen and R. Armstrong. M. S. Hunnicutt,
+D. H. Klatt, and D. Pisoni have reviewed these changes for accuracy, and the ex-
+tensive formatting necessary to produce the camera-ready copy for this book was
+done by R. Armstrong.
+
+It is a pleasure to acknowledge the several sponsors of this work over the
+years. In the early stages, research was sponsored by the Joint Services Electronics
+Program, as well as the National Institutes of Health. For many years, continuing
+and generous support has been provided by an anonymous individual donor, sup-
+plying the flexibility necessary to pursue appropriate research directions. The four
+years of concentrated effort which led to the system’s 1979 version was supported
+by the National Science Foundation. It is important to note the donation of a
+hardware pitch detector from MIT’s Lincoln Laboratory, designed and built by
+T. Bially. The detector was instrumental in providing the very large volume of
+pitch contours used as the database to construct fundamental frequency rules.
+
+The MITalk system is the result of an exciting and satisfying project. Much
+important research has been performed as a result of its needs, and the overall sys-
+tem is an impressive statement of our knowledge in this field. Certainly, there is
+still more that needs to be done in order to provide highly natural speech in dis-
+course environments. But, MITalk’s contributions are likely to play an essential
+role in any of these continuing developments.
diff --git a/pages-txt/019.txt b/pages-txt/019.txt
new file mode 100644
index 0000000..7f455dc
--- /dev/null
+++ b/pages-txt/019.txt
@@ -0,0 +1,34 @@
+Introduction
+
+In this book, we are concerned with describing a successful approach to the con-
+version of unrestricted English text to speech. Before taking up the details of this
+process, however, it is useful to place this task in context. Over the years, there
+has been an increasing need for speech generated from computers. In part, this has
+been due to the intrinsic nature of text, speech, and computing. Certainly speech is
+the fundamental language representation, present in all cultures (whether literate or
+not), so if there is to be any communication means between the computer and its
+human users, then speech provides the most broadly useful modality, except for
+the needs of the deaf. While text (considered as a string of conventional symbols)
+is often considered to be more durable than speech and more reliably preserved,
+this is in many ways a manifestation of relatively early progress in printing tech-
+nology, as opposed to the technology available for storing and manipulating
+speech. Furthermore, text-based interaction with computers requires typing (and
+often reading) skills which many potential users do not possess. So if the increas-
+ingly ubiquitous computer is to be useful to the largest possible segment of society,
+interaction with it via natural language, and in particular via speech, is certainly
+necessary. That is, there is a clear trend over the past 25 years for the computer to
+bend increasingly to the needs of the user, and this accommodation must continue
+if computers are to serve society at large. The present search for expressive pro-
+gramming languages which are easy to use and not prone to error can be expected
+to lead in part to natural language interaction as the means best suited to human
+users, with speech as the most desirable mode of expression.
+
+1.1 Constraints on speech synthesis
+
+It is clear, then, that speech communication with computers is both needed and
+desirable. Within the realm of speech output techniques, we can ask what the na-
+ture of these techniques is, and how they are realized. In order to get a view of the
+spectrum of such procedures, it is useful to consider them as the result of four dif-
+ferent constraints which determine a design space for all possible speech output
+schemes. Each technique can then be seen as the result of decisions related to the
+impact of each of the four constraint areas.
diff --git a/pages-txt/020.txt b/pages-txt/020.txt
new file mode 100644
index 0000000..6a5763e
--- /dev/null
+++ b/pages-txt/020.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+1.1.1 Task
+
+The application task determines the nature of the speech capability that must be
+provided. When only a small number of utterances is required, and these do not
+have to be varied on line, then recorded speech can be used, but if the task is to
+simulate the human cognitive process of reading aloud, then an entirely different
+
+range of techniques is needed.
+
+1.1.2 Human vocal apparatus
+
+All systems must produce as output a speech waveform, but it is not an arbitrary
+signal. A great deal of effort has gone into the efficient and insightful represen-
+tation of the speech signal as the result of a signal source in the vocal tract exciting
+the vocal tract “system function”, which acts as a filter to produce the speech
+waveform. The human vocal tract also constrains the speed with which signal
+changes can be made, and is also responsible for much of the coarticulatory
+smoothing or encoding that makes the relation between the underlying phonetic
+transcription and the speech waveform so difficult to characterize.
+
+1.1.3 Language structure
+
+Just as the speech waveform is not arbitrarily derived, the myriad possible speech
+gestures that could be related to a linguistic message are constrained by the nature
+of the particular language structure involved. It has been consistently found that
+those units and structures which linguists use to describe and explain language do
+in fact provide the appropriate base in terms of which the speech waveform can be
+characterized and constructed. Thus, basic phonological laws, stress rules, mor-
+phological and syntactic structures, and phonotactic constraints all find their use in
+
+determining the speech output.
+
+1.1.4 Technology
+
+Our ability to model and construct speech output devices is strongly conditioned
+by the current (and past) technology. Speech science has profited greatly from a
+variety of technologies, including x-rays, motion pictures, the sonograph, modern
+filter and sampled-data theory, and most importantly the modern digital computer.
+While early uses of computers were for off-line speech analysis and simulation, the
+advent of increasingly capable integrated circuit technology has made it possible to
+build compact, low-cost, real-time devices of great capability. It is this fact, com-
+bined with our substantial knowledge of the algorithms needed to generate speech,
+that has propelled the field of speech output from computers into the “real world”
+of practical commercial systems suitable for a wide variety of applications.
diff --git a/pages-txt/021.txt b/pages-txt/021.txt
new file mode 100644
index 0000000..d87ed71
--- /dev/null
+++ b/pages-txt/021.txt
@@ -0,0 +1,44 @@
+Introduction
+
+1.2 Synthesis techniques
+
+With these constraints in mind, we can examine the various approaches to speech
+output from computers. A great many techniques have been developed, but they
+can be naturally grouped in an insightful way. Our purpose here is to create a con-
+text in which text-to-speech conversion of unrestricted English text using
+synthesis-by-rule can be considered. This comparison will permit us to highlight
+the difference between the various approaches, and to compare system cost and
+performance.
+
+1.2.1 Waveform coding
+
+The simplest strategy would be to merely record (either in digital or analog format)
+the required speech. Depending on the technology used, this approach may intro-
+duce access time delays, and will be limited in capacity by the recording medium
+available, but the speech will generally be of high quality. No knowledge of the
+human vocal apparatus or language structure is needed; these systems being a
+straightforward match of the task requirements to the available storage technology.
+Since memory size is the major limitation of these schemes, efforts have been
+made to cut down the number of bits per sample used for digital storage. A variety
+of techniques has been used, from simple delta modulation, through adaptive delta
+modulation and adaptive differential PCM, to adaptive predictive coding which
+can drop the required bit rate from over 50 Kbit/sec to under 10 Kbit/sec while still
+retaining good quality speech. Simple coder/decoder circuits can be used for
+recording and playback. When the message vocabulary is small and fixed, these
+systems are attractive. But if messages must be concatenated, then it is extremely
+difficult to produce good quality speech because aspects of the speech waveform
+have been “bound” at recording time to the values appropriate for all message
+situations which use the smaller constituent messages.
+
+1.2.2 Parametric representation
+In order to further lower the storage requirements, but also to provide needed
+
+flexibility for concatenation of messages, several schemes have been developed
+which “back up” from the waveform itself to a parametric representation in terms
+of a model for speech production. These parameters may characterize salient in-
+formation in either the time or frequency domain. Thus, for example, the speech
+waveform can be formed by summing up waveforms at several harmonics of the
+pitch weighted by the spectral prominence at that frequency, a set of resonances
+can be excited by noise or glottal waveforms, or the vocal tract shape can be simu-
+lated along with appropriate acoustic excitation. As compared to waveform
+coding, more computation is now required at playback time to recreate the speech
diff --git a/pages-txt/022.txt b/pages-txt/022.txt
new file mode 100644
index 0000000..f60b2db
--- /dev/null
+++ b/pages-txt/022.txt
@@ -0,0 +1,43 @@
+From text to speech: The MITalk system
+
+waveform, but the storage requirements per message are cut down. More impor-
+tantly, the parametric representation represents an abstraction on the speech
+waveform to a level of representation where the attributes that contribute to speech
+quality (e.g. formant frequencies and bandwidths, pitch, excitation amplitudes) can
+be insightfully manipulated. This allows elementary messages to be concatenated
+in a way that provides for smooth transitions at the boundaries. It also allows for
+changes (e.g. in pitch) well within the individual message units, so that substantial
+changes in prosodic parameters (pitch and timing) can be made. The most popular
+parametric representations in use today are based on formants or linear predictive
+coding (LPC), although vocal tract articulatory models are also used. Message
+units of widely varying sizes are employed, ranging from paragraphs, through sen-
+tences, phrases, words, syllables, demisyllables, and diphones. As the size of the
+message unit goes down, fewer basic messages are needed for a large message set,
+but more computation is required, and the difficulties of correctly representing the
+coarticulation across message boundaries go up. Clearly, these schemes aim to
+preserve as much of the quality of natural speech as possible, but to permit the
+flexible construction of a large set of messages using elements which requiré little
+storage. With the current level of knowledge of digital signal processing tech-
+niques, and the accompanying technology, these schemes have become very im-
+portant for practical applications. It is well to remember, however, that parametric
+representation systems seek to match the task with the available processing and
+memory technology by using a knowledge of models for the human production of
+speech, but little (if any) use is made of the linguistic structure of the language.
+
+1.2.3 Synthesis-by-rule
+
+When message units are concatenated using parametric representations, there is a
+tradeoff between speech quality and the need to vary the parameters to adapt the
+message to varying environments. Researchers have found that many allophonic
+variations of a message unit (e.g. diphone) may be needed to achieve good quality
+speech, and that while the vocabulary of needed units is thus expanding, little basic
+understanding of the role of structural language constraints in determining aspects
+of the speech waveform is obtained. For this reason, the synthesis process has
+been abstracted even further beyond the level of parametric representation to a set
+of rules which seek to compute the needed parameters for the speech production
+model from an input phonetic description. This input representation contains, in
+itself, very little information. Usually the names of the phonetic segments, along
+with stress marks and pitch and timing, are provided. The latter prosodic correlates
+are often computed from segmental and syntactic structure and stress marks, plus
+
+10
diff --git a/pages-txt/023.txt b/pages-txt/023.txt
new file mode 100644
index 0000000..18485f2
--- /dev/null
+++ b/pages-txt/023.txt
@@ -0,0 +1,44 @@
+Introduction
+
+semantic information if available. In this way, synthesis-by-rule techniques can
+utilize a very low bit-rate message description (<100 bits/sec) as input, but sub-
+stantial computation must be used to compute the model parameters and then
+produce the speech waveform. Clearly there is complete freedom to specify the
+model parameters, but of course also the need to control these parameters cor-
+rectly. Since the rules are still imperfect, the resulting speech quality is not as
+good as recorded human speech, but recent tests have shown that high intel-
+ligibility and comprehensibility can be obtained, and when sentence and
+paragraph-level messages must be synthesized, the rule system provides the neces-
+sary degrees of freedom to produce smooth-flowing good quality speech. It is in-
+teresting to consider that synthesis-by-rule systems delay the binding of the speech
+parameter set and waveform to the input message by using very deep language
+abstractions, and hence provide a maximum of flexibility, and are thus well suited
+to the needs of converting unrestricted text to speech. The designers of these sys-
+tems must, however, discover the relationship between the underlying linguistic
+specification of the message and the resulting speech signal, a topic which has
+been central to speech science and linguistics for several decades. Thus synthesis-
+by-rule both benefits from and contributes to our general knowledge of speech and
+linguistics, and the steady improvement in speech synthesis-by-rule quality reflects
+this joint progress. While it is believed that current synthetic speech quality is ac-
+ceptable for many applications, it can certainly be expected to continue to improve
+with our increasing knowledge.
+
+1.2.4 Text-to-speech conversion |
+
+The synthesis-by-rule techniques described above require a detailed phonetic
+transcription as input. While this input requires very little memory for message
+storage, a frequent requirement is to convert text to speech. When it is desired to
+convert unrestricted English text to speech, the flexibility of synthesis-by-rule is
+needed, so that means must be afforded to convert the input text to the phonetic
+transcription needed by the synthesis-by-rule techniques. Itis clear, then, that first
+the text must be analyzed to obtain the phonetic transcription, which is then sub-
+
+jected to a synthesis procedure to yield the output speech waveform. The analysis
+of the text is heavily linguistic in nature, involving a determination of the under-
+lying phonemic, syllabic, morphemic and syntactic form of the message, plus
+whatever semantic and pragmatic information can be gleaned. Text-to-speech con-
+version can thus be seen as a collection of techniques requiring the successful in-
+tegration of the task constraints with other constraints provided by the nature of the
+human vocal apparatus, the linguistic structure of the language, and the implemen-
+
+11
diff --git a/pages-txt/024.txt b/pages-txt/024.txt
new file mode 100644
index 0000000..e3a832e
--- /dev/null
+++ b/pages-txt/024.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+tation technology. It is thus the most complex form of speech synthesis system,
+but also the most fundamental in design and useful in application, since it seeks to
+mirror the human cognitive capability for reading aloud. Other cognitive models
+attempt to synthesize speech directly from “concept” for those applications where
+the underlying linguistic structure is already available (Young and Fallside, 1979).
+These schemes have the advantage of (presumably) more detailed syntactic and
+semantic structures than can be obtained from text, and are hence of great interest
+for high-quality synthesis, but the pervading presence of text in our culture makes
+the text-to-speech capability of great practical importance. It is worth emphasizing
+that both text and speech are surface manifestations of underlying linguistic form,
+and hence that text-to-speech conversion consists first of discovering that under-
+lying form, and then utilizing it to form the output speech.
+
+In the chapters that follow, we will discuss the MITalk text-to-speech system
+in detail. The aim of this system is to provide high-quality speech from un-
+restricted English text using the fundamental results of speech science, computing,
+and linguistics. We aim to do it “right”, in the belief that adherence to basic prin-
+ciples will provide more insightful methods, avoid ad hoc “fixes”, and produce the
+best possible quality of speech. We will also discuss the range of possible applica-
+tions, and the implementation base for both a research system, and a compact, low-
+cost module utilizing state-of-the-art integrated circuit technology. First, however,
+a brief outline of the parts of the system will be presented.
+
+1.3 Functional outline of MITalk
+
+At the highest level, the system consists of an analysis phase, followed by a syn-
+thesis phase. Each of these processes is in turn broken down into a cascaded set of
+modules. In turn, each module has been described functionally as a set of al-
+gorithms operating on well-defined input and output data structures, and each
+module is afforded a chapter in the sequel for its exposition. In this introduction,
+we summarize briefly the functional content of the modules.
+
+1.3.1 Analysis of text
+
+1.3.1.1 Symbols to standard form A preprocessor is used to convert symbol
+strings such as “$3.17”, “Mr.”, “M.LT.”, and “1979” to text suitable for linguistic
+
+analysis by the remainder of the system.
+
+1.3.1.2 Phonetic transcription For each word, a phonetic transcription is com-
+puted. A dictionary of 12,000 morphs (prefixes, roots, and suffixes) is used, which
+contains the spelling, pronunciation, and part-of-speech information for each
+morph. Most words are analyzed into a string of morphs. In this way, more than
+
+12
diff --git a/pages-txt/025.txt b/pages-txt/025.txt
new file mode 100644
index 0000000..74791c0
--- /dev/null
+++ b/pages-txt/025.txt
@@ -0,0 +1,53 @@
+Introduction
+
+95 percent of the input text (consisting of high-frequency, foreign, and polysyllabic
+words) can be transcribed to phonetic notation. For rare or new words, plus
+misspellings (e.g. “recieve”), letter-to-phonetic segment rules are used.
+
+1.3.1.3 Lexical stress The effects of suffixes, as well as that of compounding, on
+lexical stress are computed, permitting the use of both stress marks in the
+
+transcription and changes in vowel color.
+
+1.3.1.4 Phonological recoding Once the initial phonetic transcription is ob-
+tained, some recoding is done based on the sentence-level context, including con-
+sonant “flapping”, insertion of glottal stops, and selection of alternate pronuncia-
+tions of “the”.
+
+1.3.1.5 Parsing To aid the selection of prosodic correlates, a phrase-level pars-
+ing is performed. Also, a part-of-speech determination for each word is computed
+to provide input for the parser.
+
+1.3.1.6 Semantic analysis Only those semantic effects due to particular lexical
+items, such as negatives, are found, but these have important effects on pitch.
+
+1.3.2 Synthesis of speech
+
+1.3.2.1 Timing Prepausal lengthening, pause duration, and polysyllabic shorten-
+ing are determined, plus the basic duration of each segment and the effect of
+clusters.
+
+1.3.2.2 Fundamental frequency A declination line is found, plus pitch rises on
+stressed syllables, continuation rises to signal continued throughout, and a number
+of segmental effects. Contours appropriate to questions are also found.
+
+1.3.2.3 Phonetic targets Given the prosodic framework, phonetic target
+parameters are determined for each phonetic segment, utilizing a “context
+window” five segments wide. There are twenty such parameters that vary with
+time.
+
+1.3.2.4 Continuation smoothing The target values are smoothed to yield a full
+set of parameters every 5 msec.
+
+1.3.2.5 Parameter conversion The phonetic parameters must be converted to
+coefficients that can be used by the digital formant synthesizer.
+
+1.3.2.6 Waveform generation The terminal synthesizer utilizes the coefficients
+
+(updated every 5 msec) to generate the speech waveform. A special purpose
+hardware synthesizer is used to perform this task in real-time. Speech samples are
+produced at a 10 kHz rate, and then converted to analog form via a D/A converter
+
+and low-pass filter.
+
+13
diff --git a/pages-txt/026.txt b/pages-txt/026.txt
new file mode 100644
index 0000000..7fb949c
--- /dev/null
+++ b/pages-txt/026.txt
@@ -0,0 +1,15 @@
+From text to speech: The MITalk system
+
+It can be seen that there are many steps from input text to output speech, but
+study of each module can lead to an insightful understanding of the overall
+process. Because of the modular nature of the overall system, changes to in-
+dividual algorithms can be readily accommodated as new ideas are developed.
+Indeed, this has been our habit for quite some time.
+
+In the sequel, we describe the algorithmic base of the system, its implemen-
+tation and evaluation, together with a view to the future. It is certainly hoped that
+this work can serve not only as an important contribution to speech output for
+many computer-based systems, but also as a point of focus for a continuing flow of
+speech and language research.
+
+14
diff --git a/pages-txt/027.txt b/pages-txt/027.txt
new file mode 100644
index 0000000..585326b
--- /dev/null
+++ b/pages-txt/027.txt
@@ -0,0 +1,2 @@
+Analysis
+
diff --git a/pages-txt/028.txt b/pages-txt/028.txt
new file mode 100644
index 0000000..ec1a876
--- /dev/null
+++ b/pages-txt/028.txt
@@ -0,0 +1,40 @@
+2
+
+Text prepfocessing
+
+2.1 Overview
+
+Unrestricted text may contain a wide variety of symbols, abbreviations, and con-
+ventions. In order to convert text to speech, it is necessary to find an appropriate
+expression in words for such symbols as “3”, “%”, and “&”, for abbreviations such
+as “Mr.”, “num.”, “Nov.”, “M.I.T.”, and conventions such as indentation for
+paragraphs. This text processing must be done before any further analysis to
+prevent an abbreviation from being treated as a word followed by an “end-of-
+sentence” marker, and to allow symbols with word equivalents to be replaced by
+
+strings analyzable by the lexical analysis modules.
+FORMAT is the first module of the MITalk system and performs the conver-
+
+sion of unrestricted text to a sequence of words and punctuation recognizable by
+the later modules. The following list contains a number of topics and symbol types
+
+which need to be considered.
+1. Blank space(s)
+2. Paragraphs
+3. Sentence-initial capitals
+4. Other capitals
+5. Abbreviations
+6. Numbers, including:
+a. Integers
+b. Numbers with a decimal point
+c. Dates
+d. Time
+7. Alphanumerics
+8. Formulas
+9. Punctuation, including;:
+a. Period
+b. Comma
+c. Question mark
+d. Exclamation point
+
+16
diff --git a/pages-txt/029.txt b/pages-txt/029.txt
new file mode 100644
index 0000000..2b89308
--- /dev/null
+++ b/pages-txt/029.txt
@@ -0,0 +1,67 @@
+Text preprocessing
+
+e. Semicolon
+
+f. Colon
+
+g. Apostrophe
+
+h. Single and double quotes
+
+i. Ellipsis (...)
+
+j. Percent sign
+
+k. Ampersand
+
+1. Parentheses
+m. Brackets
+
+n. Dashes
+
+o. Hyphens
+
+10. Symbols not recognizable by computer (and hence not recognized by
+FORMAT), including:
+
+a. Italics
+
+b. Boldface
+
+¢. Underlining
+
+d. Superscripts and subscripts
+e. Dieresis/umlaut ()
+
+f. Cedilla (¢)
+
+g. Various forms of special notation
+
+2.2 Input
+
+FORMAT accepts as input the original unrestricted English text to be analyzed.
+This text is a sequence of lines of letters and symbols expressed in a computer-
+readable form (in all implementations of MITalk, the ASCII character set is used).
+The actual letters recognized are:
+
+1. Uppercase and lowercase letters
+
+2. Numeric digits
+
+3. Period (or decimal point), question mark, and exclamation point
+4. Comma, semicolon, and colon
+
+5. Apostrophe
+
+6. Single and double quote marks
+
+7. Parentheses, brackets, and braces
+
+8. Percent sign, dollar sign, and ampersand
+
+9. Slash
+Any character which is not recognized by FORMAT causes a warning message
+
+and is treated as a space.
+
+17
diff --git a/pages-txt/030.txt b/pages-txt/030.txt
new file mode 100644
index 0000000..416fdab
--- /dev/null
+++ b/pages-txt/030.txt
@@ -0,0 +1,54 @@
+From text to speech: The MITalk system
+
+The size of individual words and sentences is limited, but set at a high value
+to include all reasonable cases. Words are allowed 40 characters each, and the
+maximum number of words per sentence is 200. If the limit of 40 characters per
+word is exceeded, the word is truncated and a message indicating the problem and
+number of allowable characters per word is printed for the user.
+
+2.3 Output
+
+The output of FORMAT is a sequence of words and punctuation marks.
+FORMAT scans each input line from left to right and converts each recognized
+construct (word, number, symbol, etc.) into an appropriate word or sequence of
+words. Since case is not significant in the later modules of MITalk, each word is
+written in all uppercase letters.
+
+An example of input and output is shown here in Figure 2-1. (Input text is in
+boldface.)
+
+Mr. Jones gets 35.3%.
+FORMAT: MISTER
+FORMAT: JONES
+FORMAT: GETS
+FORMAT: THIRTY
+FORMAT: FIVE
+FORMAT: POINT
+FORMAT: THREE
+FORMAT: PERCENT
+FORMAT: .
+FORMAT: .
+
+Figure 2-1: Example of FORMAT processing
+
+2.4 Formatting operations
+The various translations performed by FORMAT are described in detail below.
+
+2.4.1 Paragraphs and sentences
+
+Whitespace (i.e. spaces and/or tabs) at the beginning of a line followed by a capi-
+talized word is taken to denote the beginning of a paragraph. FORMAT translates
+this whitespace into a period (.) which later gets translated into a pause.
+
+An additional pause is inserted after each sentence longer than five words
+(also after each group of short sentences longer than five words). As with the
+paragraph beginning, this pause is effected by adding an extra period after the sen-
+tence. This emulates a human speaker pausing for breath every so often.
+
+The end of a sentence is delimited by a period, question mark, or exclamation
+point. Not all periods denote the end of a sentence, however. If a period ends an
+abbreviation, then it is only taken as an end-of-sentence marker if it is at the end of
+a line and if it is followed by whitespace and a capitalized word. A period inside a
+numeric string is considered to be a decimal point, of course.
+
+18
diff --git a/pages-txt/031.txt b/pages-txt/031.txt
new file mode 100644
index 0000000..f9484cc
--- /dev/null
+++ b/pages-txt/031.txt
@@ -0,0 +1,49 @@
+Text preprocessing
+
+2.4.2 Words, abbreviations, and special symbols
+
+FORMAT recognizes a word as an alphabetic string delimited by a punctuation or
+whitespace character (the newline character which separates lines is considered to
+be whitespace). If a word is followed by a period, then FORMAT looks in a table
+of abbreviations to see if a translation is specified for that word. Table 2-1 shows
+the abbreviation table currently in use. If a translation is found, then the translated
+word(s) are output in place of the original abbreviation.
+
+Table 2-1: Abbreviation translations performed by FORMAT
+
+MIZ
+
+Ms -
+
+Mr - MISTER
+
+Mrs - MIZZES
+
+Dr - DOCTOR
+Num - NUMBER
+Jan - JANUARY
+Feb - FEBRUARY
+Mar - MARCH
+
+Apr - APRIL
+
+Aug - AUGUST
+Sept - SEPTEMBER
+Oct - OCTOBER
+Nov - NOVEMBER
+Dec - DECEMBER
+etc - ET CETERA
+Jr - JUNIOR
+Prof - PROFESSOR
+
+A word that is in capital letters, or which contains digits as well as letters, is
+considered to be a symbol and is translated by pronouncing each character
+separately (e.g. for USA and MIT). When a letter is to be pronounced, it is
+represented by a special noun morph which has the proper pronunciation for the
+letter (e.g. A—LETTER-A). A word that is in lowercase, or which has only the
+first letter capitalized, is simply converted to uppercase and output.
+
+2.4.3 Apostrophes and single quotation marks
+If an apostrophe is embedded in a word, then the entire word is output as a unit.
+
+19
diff --git a/pages-txt/032.txt b/pages-txt/032.txt
new file mode 100644
index 0000000..b646a44
--- /dev/null
+++ b/pages-txt/032.txt
@@ -0,0 +1,54 @@
+From text to speech: The MITalk system
+
+The apostrophe is also included in the word if it appears after the last letter in the
+word and that last letter is an s. An apostrophe in any other position is considered
+to be a single quotation mark and is output as a punctuation character.
+
+2.4.4 Hyphens and dashes
+
+If a dash character is embedded between two words, it is considered to be a hyphen
+separating compound word elements. In the current implementation, the hyphen is
+deleted and the compound is treated as two separate words (e.g. two-layer —
+TWO LAYER). This solution prevents the correct stress pattern from being
+placed on a hyphenated compound, but, on the other hand, it prevents incorrect
+decompositions which might result from simply concatenating the two roots at this
+point.
+
+If a dash appears at the end of the last word on a line, then the dash is con-
+sidered to be a word-splitting hyphen. In this case, FORMAT deletes the hyphen
+from the end of the current word and appends to that word the first word on the
+next line. This rule reassembles words which are divided at the end of a line on a
+
+syllable boundary.
+An isolated dash is output as a punctuation character and eventually becomes
+
+a pause. A string of dashes (isolated or embedded) is converted to a single dash
+and output as punctuation.
+
+2.4.5 Special symbols
+A percent sign (%) is replaced by the words PER CENT. An ampersand (&) is
+
+replaced by the word AND.
+
+2.4.6 Numerals
+
+FORMAT recognizes a number as a string of digits with optional commas and/or a
+period (decimal point). There are two ways of pronouncing numbers: each digit in
+sequence (e.g. 75— SEVEN FIVE), and in decimal form (e.g. 75— SEVENTY
+FIVE). FORMAT selects the appropriate type of pronunciation based on the form
+and context of the number.
+
+2.4.6.1 Integers, commas, and decimal points A complete number consists of a
+set of comma-separated digit triads (the integer portion), optionally followed by a
+decimal point and fraction digits. The integer portion is pronounced by pronounc-
+ing each triad from left to right and appending the appropriate multiplying word to
+each triad (e.g. BILLION, MILLION, THOUSAND, or nothing for the rightmost
+triad).
+
+A triad is pronounced as follows:
+
+o If the left digit is nonzero, then it is pronounced followed by the word
+
+HUNDRED.
+
+20
diff --git a/pages-txt/033.txt b/pages-txt/033.txt
new file mode 100644
index 0000000..4e7a346
--- /dev/null
+++ b/pages-txt/033.txt
@@ -0,0 +1,52 @@
+Text preprocessing
+
+o If the middle digit is larger than one, then the appropriate “tens” word
+is pronounced (e.g. TWENTY, THIRTY, etc.). If the middle digit is
+one, then the appropriate “teens” word is selected based on the
+rightmost digit (e.g. TEN, ELEVEN, TWELVE, etc.).
+
+o If the middle digit is not one and the rightmost digit is not zero, then
+
+the rightmost digit is pronounced.
+
+If a period separates two numeric strings, then it is translated into the word
+POINT and the following numeric string is pronounced digit-by-digit (e.g. .015—
+POINT OH ONE FIVE). Note that 0 is pronounced OH in this case. A 0 is
+pronounced as ZERO only if it is a one-digit number. For example:
+
+¢ 715—SEVEN HUNDRED FIFTEEN
+
+¢ 71.50 - SEVENTY ONE POINT FIVE OH
+
+¢ 159,106 ~ONE HUNDRED FIFTY NINE THOUSAND ONE
+HUNDRED SIX
+
+2.4.7 Dollars and cents
+If a dollar sign ($) precedes a number as described above, then the following
+modifications to the pronunciation are made:
+
+e The word DOLLAR or DOLLARS is inserted after the integer part.
+¢ The decimal point is pronounced as AND instead of POINT.
+¢ The fraction part is pronounced in two-digit decimal form.
+
+e The word CENT or CENTS is appended after the fraction part.
+For example, $71.50 - SEVENTY ONE DOLLARS AND FIFTY CENTS.
+
+2.4.8 Years and comma-less numbers
+A string of more than three digits without commas is given special treatment. If
+the number has four digits, the first of which is 1, then it is considered to be a year
+and is pronounced as follows:
+¢ The leftmost two digits are pronounced as “teens”.
+o If the rightmost two digits are both 0 then they are pronounced as
+HUNDRED. If they are 0 followed by a nonzero digit then they are
+pronounced individually. Otherwise, the rightmost two digits are
+
+pronounced in decimal form.
+
+Digit strings longer than three digits which do not contain commas (and are
+not candidates for year pronunciation) are pronounced as individual digits. Strings
+of less than four digits which begin with 0 are also pronounced individually.
+
+Some examples are:
+
+21
diff --git a/pages-txt/034.txt b/pages-txt/034.txt
new file mode 100644
index 0000000..4ccc8a9
--- /dev/null
+++ b/pages-txt/034.txt
@@ -0,0 +1,12 @@
+From text to speech: The MITalk system
+
+22
+
+¢ 0159— OH ONE FIVE NINE
+
+¢ 1590 > FIFTEEN NINETY
+
+¢ 7150 - SEVEN ONE FIVE OH
+¢ 1906 - NINETEEN OH SIX
+
+¢ 1800 - EIGHTEEN HUNDRED
diff --git a/pages-txt/035.txt b/pages-txt/035.txt
new file mode 100644
index 0000000..6f94e78
--- /dev/null
+++ b/pages-txt/035.txt
@@ -0,0 +1,38 @@
+3
+
+Morphological analysis
+
+3.1 Overview
+
+MITalk is designed to convert unrestricted English text into a synthetic speech
+waveform. In the initial analysis phase, text character strings are converted to a
+narrow phonetic transcription consisting of phonetic symbols and prosodic
+markers. While the output unit types are thus specified, the question remains as to
+the type of unit to be used with the input character string. Since there is an infinite
+number of possible English sentences, it is not possible to store all English sen-
+tences and their corresponding phonetic transcriptions in a form suitable for the
+synthesis phase of MITalk. The next smaller unit recognizable from the input
+string is the word. The number of English words is large, but bounded, so one
+might consider use of a word lexicon which would contain the spelling and
+phonetic transcription (together with part-of-speech information) for all English
+words. Aside from the size of this dictionary, there are several attractive features
+of this approach. Some form of dictionary must be used to provide pronunciations
+for exceptions to other mechanisms (e.g. rules) used to derive pronunciations.
+These arise in part from foreign words that have retained the pronunciation of their
+language of origin (e.g. parfait and tortilla). Furthermore, all mechanisms
+derived thus far for the conversion of letter strings to phonetic segment labels
+provide some errors, and it seems to be inherent in natural languages that no for-
+mal means derived for the representation of their structure has covered all ob-
+served forms without error. An interesting class of exceptional pronunciation
+arises for high-frequency words. Initial th is pronounced as a voiceless fricative in
+many words (thin, thesis, thimble) but for very frequent words, such as the short
+function words (the, this, there, these, those, etc.), it is pronounced in a voiced
+manner. Similarly, f is always pronounced as an unvoiced fricative, except for the
+single case of. In words such as shave and behave, the final silent e has the effect
+of lengthening or tensing the preceding vowel, but in the frequent word have this
+is not the case. Finally, the final s in atlas and canvas is unvoiced, but for the
+function words (is, was, has) it is voiced. It thus appears that these high-frequency
+words should be placed in an exceptions dictionary if a set of rules is to be used for
+converting letter strings to phonetic segment labels.
+
+23
diff --git a/pages-txt/036.txt b/pages-txt/036.txt
new file mode 100644
index 0000000..d5a444d
--- /dev/null
+++ b/pages-txt/036.txt
@@ -0,0 +1,45 @@
+From text to speech: The MITalk system
+
+From the above discussion, it is clear that some form of exceptions dictionary
+is necessary. Given that all systems will provide such a lexicon, there are two
+choices that deal with the nonexceptional words. On one extreme, system desig-
+ners could attempt to provide a “complete” word dictionary. Unfortunately, while
+the number of words is bounded, new words are constantly invented by productive
+processes of compounding (e.g. earthrise and cranapple) and by filling
+“accidental gaps” (in the phonological sense) as in brillig. Furthermore, a com-
+prehensive word lexicon would have to store all regularly inflected forms, which
+places a large burden on the storage required. So a “complete” word lexicon will
+not do. This fact has led investigators to consider the other extreme, namely the
+provision of a set of letter-to-sound rules that would convert input letter strings to
+phonetic segment labels through some sort of scanning and transformation process.
+Such rule sets have indeed been constructed (MITalk has an extensive set), and
+they are very productive. But difficulties remain. It has been difficult to provide a
+high degree of accuracy from these rule sets, leading to increases in the size of the
+“exceptions” dictionary. These problems arise in part due to the fact that there is
+internal structure in words that must be recognized in order to derive the correct
+pronunciation.
+
+Letter-to-sound rules recognize small structures within words in the form of
+consonant and vowel clusters. Syllables provide additional structure, but it has not
+been possible to reliably and consistently find syllable boundaries in the letter
+string. The minimum syntactic unit of a language, however, is the morpheme, and
+it has an important role to play in the determination of pronunciations. It will also
+be seen that when morphemes are represented by letter string segments called
+“morphs”, they can be effectively used as the basis for determining word pronun-
+ciation. MITalk uses a morph lexicon that can be viewed as a bridge between the
+two extreme approaches cited above. Together with an effective analysis proce-
+dure, this lexicon provides for accurate pronunciations, including exceptions, and
+also provides a natural role for letter-to-sound rules which must be present in order
+
+to convert unrestricted English text to speech.
+Roughly speaking, morphs consist of prefixes, roots, and suffixes. An English
+
+word always has at least one root, but may have additional roots as well as prefixes
+
+and suffixes. Thus snow is a single morph, but snowplow is a compound of two
+morphs, and snowplows has two roots and an inflectional suffix providing the
+plural marker; relearn has a prefix as well as a root, and
+antidisestablishmentarianism has no fewer than seven recognizable morphs.
+These morphs are the atomic constituents of words, and they are relatively stable
+
+24
diff --git a/pages-txt/037.txt b/pages-txt/037.txt
new file mode 100644
index 0000000..6a553a2
--- /dev/null
+++ b/pages-txt/037.txt
@@ -0,0 +1,43 @@
+Morphological analysis
+
+in a language. They are often the ingredients of newly coined compound words,
+but new morphs are rarely formed. For this reason, they are good candidates for
+lexical entries, provided a means can be found to analyze words into their con-
+stituent morphs. As will be seen, an effective morph lexicon can have less than
+10,000 entries, so that reasonable storage efficiency is provided, particularly in
+contemporary integrated circuit technology. It is also important to note that with a
+morph lexicon and associated analysis procedure, there is no need to store all of
+the regularly inflected forms, as is the case with a whole word lexicon.
+
+Because morphs are the basic constituents of words, it is important to show
+their utility in determining pronunciations. When morphs are joined together, they
+often change pronunciation depending on the nature of the morphs involved.
+Thus, when the plural form of the singular nouns dog and cat is realized, the final
+s is voiced in dogs but unvoiced in cats. This is a form of morphophonemic rule
+having to do with the realization of the plural morpheme in various environments.
+In order to use these rules, it is necessary to recognize the constituent morphemes
+of a word, so it is apparent that there is an important class of pronunciation effects
+facilitated through the detection of morphs and their boundaries. MITalk provides
+a comprehensive implementation of the morphophonemic rules of English.
+
+In addition to the importance of morphophonemic rules, morphs serve to
+break up a word for purposes of pronunciation. This observation is important for
+the proper utilization of letter-to-sound rules. Most sets of letter-to-sound rules
+treat each word as an unstructured sequence of letters, and use a scanning window
+to find consonant and vowel letter clusters that can be readily converted to
+phonetic segment labels. Thus, as we have already seen, th is a letter cluster cor-
+responding to a single fricative phonetic segment, as in thesis. But in the word
+hothouse, the th cluster is broken up by a morph boundary, and no medial frica-
+tive is present. Similarly, the letter cluster sch has a regular pronunciation in
+school and scheme, but in the words mischance and discharge the cluster is
+broken up by the internal morph boundary. In English, the vowel digraph ea
+presents many difficulties for a letter-to-sound algorithm, but in the word
+changeable it is clearly broken up. In essence, the morph structure is essential to
+provide the correct pronunciation. These cases can of course be treated as excep-
+tions, but this will increase the size of the lexicon unnecessarily, and it is also clear
+that important generalities will be lost. In the MITalk system, morph analysis is
+always attempted before letter-to-sound rules are used, and care is taken to ensure
+that letter-to-sound rules are not applied across morph boundaries. Thus, not only
+does the use of morphs lead to an efficient and productive lexicon, it also naturally
+
+25
diff --git a/pages-txt/038.txt b/pages-txt/038.txt
new file mode 100644
index 0000000..8920907
--- /dev/null
+++ b/pages-txt/038.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+provides for important pronunciation effects due to morph structure, and sets an
+appropriate basis for the formulation of a well-motivated set of letter-to-sound
+rules devoid of ad hoc exceptions.
+
+So far, we have shown how the use of a morph lexicon and accompanying
+morph analysis procedures provides a sound solution to the accurate translation of
+English word letter strings to sequences of phonetic segment labels. It is important
+to realize, however, that morphs are just the surface realization of underlying mor-
+phemes, and the distinction between these two units must be maintained. Mor-
+phemes are abstract units, and they exist only for .purposes of grammatical or dis-
+tributional equivalence. Their use recognizes that words have internal structure,
+and that the components of this structure are the constituent morphemes of the
+word. Historically, morphemes were introduced to define phonological units where
+segmentability was possible, as in the sequence tall, taller, tallest. But there is
+nothing in the definition of a morpheme to imply that it must always be an identifi-
+able segment of the word of which it is a constituent. The morpheme is not a seg-
+ment of a word, and it has no position in a word. It is an abstract unit arising from
+linguistic distributional analysis. This can be seen clearly by comparing the words
+went and walked. In the latter word, it is easy to see that there are two constituent
+morphs, walk and ed, which are in one-to-one correspondence with the underlying
+abstract morphemes walk and PAST. But in the case of went, the underlying mor-
+phemic analysis provides the two morphemes go and PAST, and it is impossible to
+map these in any nonarbitrary way onto the surface letter string went. When seg-
+mentation is possible, as is often the case, then morphs can be identified, and
+MITalk exploits this fact. For the cases where a root is given a grammatical inflec-
+tion, as in went, MITalk provides a special morph type, STRONG, that indicates
+the presence of the two underlying morphemes. Clearly went must go in the morph
+lexicon, as it is an exception to the normal processes of affixation and compound-
+ing. Additionally, the morpheme PLURAL provides ample evidence of the many
+ways in which it may be realized on the surface. We- have the pairs boy/boys,
+thief/thieves, child/children, tooth/teeth, and fish/fish, as well as many borrowed
+pairs from other languages such as concerto/concerti, datum/data, index/indices,
+and alummis/alumni. These irregular plurals must be placed in the lexicon, since
+MITalk can only deal with morphs that can be found through detection of the
+regular and productive word formation processes that are susceptible to segmen-
+tation. Many of the analysis procedures of MITalk are based on the underlying
+morphemic constituency of a letter string, although only morphs can be exhibited
+as letter strings or can occur in the lexicon.
+
+26
diff --git a/pages-txt/039.txt b/pages-txt/039.txt
new file mode 100644
index 0000000..6aa54af
--- /dev/null
+++ b/pages-txt/039.txt
@@ -0,0 +1,47 @@
+Morphological analysis
+
+The use of morphs in MITalk is unique, and it is responsible for much of the
+quality of the phonetic segment label sequences that are used for synthesis. There
+is no doubt that they introduce several levels of complication. These include the
+necessity of producing a morph lexicon and the need for a morph segmentation al-
+gorithm. The concatenation of morphs to form a word often gives rise to spelling
+mutations that cause segmentation difficulties, and several “morph coverings” of a
+word are often found leading to a need for selection criteria. Nevertheless, the
+gains far outweigh the negative costs, and in the following sections, we elaborate
+on these robust and effective techniques.
+
+3.2 Input
+In MITalk, morphemic analysis is provided in the DECOMP module. DECOMP’s
+
+input data stream has the same structure as the output stream from FORMAT
+which precedes DECOMP in the MITalk system. Each record in the data stream
+
+contains either a word or a punctuation mark. Words consist of uppercase letters,
+apostrophes, and/or hyphens. Legal punctuation marks are period, exclamation
+point, question mark, comma, semicolon, colon, double quotation, single quota-
+tion, left and right parentheses, and dash. DECOMP also accesses a compiled bi-
+nary format morph lexicon.
+
+3.3 Output
+The output data stream consists of a sequence of decomposed word records. The
+
+following information is contained in each record:
+1. Word spelling |
+2. Word part of speech (possibly more than one)
+3. For each part of speech, an optional list of part-of-speech features
+4., The series of morphs obtained by decomposition
+5. For each morph, the following information:
+a. Morph spelling
+b. Morph type
+c. One or two homographs
+
+d. For each homograph, a pronunciation and part(s) of speech
+If no decomposition was found for the word, then the morph list is omitted
+and the word is assigned a default set of possible parts of speech. Punctuation
+marks receive a special part-of-speech code (either EndPunctuationMark (EPM)
+for sentence-ending punctuation or InternalPunctuationMark (IPM) for all others).
+Part-of-speech processing will be described in detail in the next chapter where the
+
+phrase parser is discussed.
+
+27
diff --git a/pages-txt/040.txt b/pages-txt/040.txt
new file mode 100644
index 0000000..a4b42ed
--- /dev/null
+++ b/pages-txt/040.txt
@@ -0,0 +1,57 @@
+From text to speech: The MITalk system
+
+3.4 The algorithm
+The goal of the decomposition process is to obtain a morph covering of a word.
+The word “covering” is used to indicate that a simple concatenation of morph
+spellings will not, in many cases, provide a correct analysis. It is sometimes the
+case, particularly when a vocalic suffix is involved, that spelling changes occur at
+morph boundaries. In addition, there may be several distinct coverings of a given
+word. |
+
+In light of the observations above, the decomposition algorithm consists of
+three major components:
+
+1. a recursive morph partitioning algorithm,
+
+2. a set of spelling change rules for use at morph boundaries, and
+
+3. a set of selectional rules to distinguish between legal and illegal
+
+morph sequences and to choose the best covering when multiple
+
+legal coverings exist.
+These components are described in detail below.
+
+3.4.1 Recursive morph decomposition
+
+The overall control structure of the decomposition procedure is recursive. At each
+step in the recursion, the right end of the word is matched against the longest lex-
+icon morph possible, then the procedure is recursively invoked on the remaining
+“uncovered” portion of the word. If this recursive invocation fails to produce a
+
+covering, then the original match is discarded and the next longest matching morph
+is used.
+
+Input to the decomposition procedure consists of:
+
+1. a word or remainder to be covered,
+
+2. a state flag that indicates which morph types are legal in the current
+context, and
+
+3. a score value that is used to rank multiple decompositions according
+
+to their likelihood of being correct.
+Initially, the entire word is presented as input, the state flag is set to a value
+indicating that no morphs have been found yet, and the score is set to zero. A con-
+
+cise informal description of the procedure follows:
+find the longest morph which matches the right end of the current string
+WHILE there is a match DO
+
+IF the matching morph is compatible with the current context (state)
+THEN remove the matched letters from the right side of the string,
+update the current state and score as a function of the type of
+the matched morph.
+
+28
diff --git a/pages-txt/041.txt b/pages-txt/041.txt
new file mode 100644
index 0000000..6a7b645
--- /dev/null
+++ b/pages-txt/041.txt
@@ -0,0 +1,51 @@
+Morphological analysis
+
+find a set of possible spelling changes! at the right end of the
+remainder,
+
+attempt a recursive decomposition for each spelling variation,
+
+save the results of the best-scoring of these variations,
+
+restore the remainder string, state, and score to their original
+
+values.
+ENDIF,
+find the next longest morph which matches the right end of the string.
+END WHILE.
+
+The decision to search from the right end of the word was made early in the
+development of the system before the selectional rules were implemented. It was
+observed that the best decomposition was found first by stripping off suffixes be-
+fore searching for roots and prefixes. When a later algorithm was developed in
+which all decompositions were found and a choice made, the strategy was retained.
+Since only the decomposition with the best score is kept while searching for other
+possible morph coverings, finding the best decomposition early in the search is still
+more efficient; potential coverings with worse scores can be discarded as early as
+possible.
+
+3.4.2 Morph types
+
+Not all sequences of morphs are legal in the English language. For this reason
+(and later, for scoring multiple coverings) each morph in the lexicon has a type
+code. These morph type codes refine the coarse categories of “prefix”, “suffix”,
+and “root” to obtain better performance in finding the correct covering.
+
+The morph type “FREE ROOT” (or simply “ROOT”) denotes a word which
+can appear alone or with suffixes, prefixes, and/or other ROOTs. Typical ROOTs
+are: side, cover, and spell. The type “ABSOLUTE” is assigned to words which do
+not allow most affixes (suffixes or prefixes). These are words such as the, into, of,
+and proper names. (The few affixes permitted are the inflectional suffixes such as
+plural and possessive forms.) This type is essential in preventing DECOMP from
+attempting to match the morphs a and I in many words.
+
+Most prefixes have the type “PREFIX” that denotes a prefix which can com-
+bine with roots and other prefixes. Examples are: pre, dis, and mis. The remain-
+ing prefixes can only occur at the beginning of a word and are classified as
+
+“INITIAL”. Examples are meta and centi.
+Suffixes are classified using two different criteria yielding a total of four
+
+INote that unchanged spelling is always one of these possibilities.
+
+29
diff --git a/pages-txt/042.txt b/pages-txt/042.txt
new file mode 100644
index 0000000..64f791e
--- /dev/null
+++ b/pages-txt/042.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+morph types. The first criterion is functional and divides suffixes into derivational
+(“DERIVATIONAL” or “DERIV”) and inflectional (“INFLECTIONAL” or
+“INFL”) types. Derivational suffixes have a major effect on the meaning of a root
+and may change the part of speech (e.g. ness, ment, y). Inflectional suffixes
+merely change the tense, number, or inflection of the root (e.g. ing, ed, s). This
+classification is used primarily by the scoring algorithm.
+
+The other suffix classification is used solely by the spelling change rules.
+This divides suffixes into vocalic and nonvocalic categories depending on whether
+
+the suffix begins with a vowel or consonant, respectively. The type names are
+“VOCALIC” (or “VOC”) and “NONVOCALIC” (or “NONV?”).
+
+The “STRONG” morph type denotes a root which already contains tense or
+number information. This type of morph is a combination of root and inflectional
+morphemes which are not reflected directly in the morph structure. Examples are
+went (20+PAST) and women (woman+PLURAL).
+
+In addition to free roots, there are two types of bound roots. The “LEFT
+FUNCTIONAL ROOT” (or “LF-ROOT”) is a root which must always be followed
+by a derivational suffix. An example is absorpt in absorptive and absorption. In
+this case (as with many LF-ROOTs), the morph represents a suffix-caused spelling
+mutation of a root morpheme which is too complex or idiosyncratic for the spelling
+change rules to incorporate (e.g. absorb+ive — absorptive). A “RIGHT FUNC-
+TIONAL ROOT” (or “RF-ROOT”) must always be preceded by a prefix. For ex-
+ample, mit in permit, transmit, and submit. These morphs generally have some
+etymological basis (and are not simply repeated letter patterns). For example: the
+root mit is derived from the Latin mittere -- to send; it is just that the root itself
+never became part of the English language and its meaning is overlooked by the
+average speaker.
+
+The hyphen (-) has its own morph type “HYPHEN”. This is provided so that
+hyphenated words which do not appear directly in the lexicon can be properly
+decomposed.
+
+3.4.3 Legal morph sequences
+The detection of legal and illegal morph sequences is performed by a finite state
+machine (FSM).
+
+The grammar recognized by the FSM is summarized in production rules
+below:1
+
+IThese use Wirth’s notation: = for production, [ ] for optional factors (zero or one rep.), { } for
+repeated factors (zero to infinite repetition), () for grouping, and | for alternatives.
+
+30
diff --git a/pages-txt/043.txt b/pages-txt/043.txt
new file mode 100644
index 0000000..aa4fcd8
--- /dev/null
+++ b/pages-txt/043.txt
@@ -0,0 +1,44 @@
+Morphological analysis
+
+effective-root = ROOT | LF-ROOT DERIV | PREFIX RF-ROOT |
+STRONG
+
+suffix = DERIV | INFL
+affixed-word = { PREFIX } effective-root { suffix }
+
+absolute-word = ABSOLUTE | ABSOLUTE INFL { suffix } | INITIAL
+affixed-word
+
+word = affixed-word | absolute-word
+
+compound-absolute = absolute-word | absolute-word HYPHEN com-
+pound | ABSOLUTE INFL { suffix } compound-affixed
+
+compound-affixed = affixed-word | affixed-word HYPHEN compound |
+affixed-word compound-affixed
+
+compound = compound-affixed | compound-absolute
+
+DERIV
+
+HYPHEN
+INFL
+
+ROOT
+STRONG
+
+PREFIX
+
+HYPHEN
+INFL
+
+PREF DERIV
+Figure 3-1: State transition diagram for the morph sequence FSM
+
+Figure 3-1 shows the state transition diagram of the FSM. Each state of the
+FSM represents a summary of the type sequence of the morphs which have been
+stripped from the word being decomposed. It is this state which is passed as a
+parameter to the recursive decomposition procedure. The “right context”
+represented by each state is easily expressed and is summarized below. For each
+
+31
diff --git a/pages-txt/044.txt b/pages-txt/044.txt
new file mode 100644
index 0000000..1050a29
--- /dev/null
+++ b/pages-txt/044.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+state, a picture of the input stream is shown using the metalanguage of the gram-
+mar above and with “<>" marking the position in the stream represented by the
+state. To the right of the marker is context represented by the state. To the left, is
+an expression representing the expected structure of the remainder of the word.
+
+FO word < {INFL {suffix}}
+RO (affixed-word | LF-ROOT) <> DERIV {suffix}
+
+R1 (affixed-word | LF-ROOT) <> DERIV effective-root
+
+M1 PREFIX <> RF-ROOT ({suffix}
+
+L1 {affixed-word | PREFIX | INITIAL} <> effective-root {suffix}
+
+LO {affixed-word | PREFIX | INITIAL} <> PREFIX effective-root {suffix}
+I0 {word HYPHEN} <> (ABSOLUTE | INITIAL affixed-word)
+
+3.44 Selectional rules and scoring
+
+When multiple morph coverings are found, selectional rules are needed to choose
+the covering most likely to be correct. For example, a means of favoring
+form+al+ly (ROOT + DERIV + DERIV) over form+ally (ROOT + ROOQOT) as the
+decomposition of formally is needed. A set of derivational rules was devised by
+examining all of the multiple coverings produced by DECOMP during the
+development of the morph lexicon. The first result of this study was the discovery
+of the so-called “standard form” for a (possibly compound) word stated below as
+two productions:
+
+std-root = (ROOT | LF-ROOT DERIYV)
+
+std-form = {PREFIX} {std-root} (std-root {DERIV} | STRONG) {INFL}
+Coverings which match this form are to be preferred above all others.
+Among coverings that match the standard form, the following partial order-
+ings were found (“>" means that the pattern on the left is more desirable):
+
+ROOT > anything else
+PREFIX+ROOT > ROOT+DERIV > ROOT+INFL > ROOT+ROOT
+PREFIX+PREFIX+ROOT > ROOT+ROOT
+
+ROOT+DERIV+DERIV > ROOT+ROOT
+
+These rules are implemented by associating a cost with each transition of the
+FSM and keeping track of the total cost of the decomposition as morphs are
+stripped off the word. This cost is the “score value” mentioned above in the algo-
+rithm description. The covering with the lowest total cost is the most desirable.
+
+32
diff --git a/pages-txt/045.txt b/pages-txt/045.txt
new file mode 100644
index 0000000..933b6c2
--- /dev/null
+++ b/pages-txt/045.txt
@@ -0,0 +1,52 @@
+Morphological analysis
+
+In Figure 3-1 the transition arcs are labeled with the associated incremental
+cost as well as morph type. The specific cost values are not significant, only their
+relative values. The values were chosen to cause the FSM to implement the rules
+above, then fine-tuned to get the best overall performance. The cost of a standard-
+form covering is easily computed and is the sum of the following:
+
+¢ 34 units for each PREFIX,
+
+e 101 units for the first effective-root and 133 units for each additional
+effective-root (if the rightmost effective-root is STRONG, add an ex-
+tra 64 units to account for the “hidden” inflectional morpheme),
+
+¢ 35 units for each DERIV, and
+
+¢ 64 units for each INFL.
+
+The only other notable feature of the scoring is that any transition not part of the
+standard form incurs a 512-unit penalty. In order to allow a single ABSOLUTE
+root to match a word, the penalty is suppressed for this case and the cost is taken to
+be the same as for a single ROOT covering.
+
+The recursive procedure takes advantage of the cost information to reduce the
+number of matching operations. The cost of the best complete covering found be-
+fore the current step in the recursion is recorded. As a new morph is matched, the
+cost of its associated transition in the FSM is added to the running score. In ad-
+dition, the minimum possible cost for the decomposition of the remainder is also
+computed. If the sum of this cost and the current cost is not less than the best cost
+so far, then the new morph is immediately rejected as being too expensive.
+
+3.4.5 Recognizing morphological mutations
+After a suffix morph has been removed from a word, it is necessary to investigate
+possible spelling changes which may have taken place during composition. Typi-
+cal spelling changes (during morph composition) are:
+
+ey — i (embody+ment — embodiment),
+
+e consonant doubling before a vocalic suffix (pad+ing — padding),
+
+and
+
+e dropping of “silent e” before a vocalic suffix (fire+ing — firing).
+
+Different morphs have differing behavior in the presence of change-causing
+suffixes. Three general categories of morph behavior are provided for in
+DECOMP. In the lexicon, each morph has a spelling change code which indicates
+whether spelling changes are forbidden, required, or optional when the morph is
+combined with a suffix. The “required” category is currently used exclusively for
+morphs with consonant endings which are always doubled in the presence of a
+
+33
diff --git a/pages-txt/046.txt b/pages-txt/046.txt
new file mode 100644
index 0000000..ab11957
--- /dev/null
+++ b/pages-txt/046.txt
@@ -0,0 +1,50 @@
+From text to speech: The MITalk system
+
+vocalic suffix. The “optional” category is used for all other morphs which permit
+spelling changes. Examples are:
+
+e required - scar+ed —scarred
+
+o forbidden - alloy+ing — alloying
+
+e optional - change+able — changeable, change+ing — changing
+
+The spelling changes performed by DECOMP consist of appending a letter,
+or changing or deleting the last letter. When a morph is matched to the left end of
+a word, the following procedure is used to determine the set of possible spelling
+changes: If the matched morph is not a suffix, then no spelling changes are made
+and recursion proceeds normally. If the matched morph is a nonvocalic suffix and
+the last remaining letter is an i, then the change i —>y is examined. If the matched
+morph is a vocalic suffix, then spelling changes are performed by matching a
+three-character template against the last two remainder letters and the first letter of
+the matched morph.
+
+If the vocalic suffix is es, then a special check is made to determine whether
+the letter e should be considered part of the suffix. If the remainder does not end
+in ¢, ch, g, i, o, s, sh, x, or z, then the es match is immediately rejected. This
+causes the morph s to be the next match; the e is thus moved from the suffix to the
+remainder. This rule is motivated by the phonetic properties of the plural suffixes
+s and es. The lexicon entry for s gives a pronunciation zz while the entry for es
+gives 18 2zz. The rule allows DECOMP to make the appropriate decomposition
+of tunes— tune+s rather than tunes— tune+es which is found first. The two
+vowels, i and o, are permitted to precede es to enable correct decomposition of
+words such as heroes and parties, even though pronunciation is not correct in such
+cases; morphophonemic rules are used in a later MITalk module to obtain the
+proper pronunciation. |
+
+Table 3-1 shows the set of template patterns and their resulting spelling
+change actions. The plus sign (+) in the pattern denotes the boundary between the
+suffix and the remainder. A dot (.) in the pattern matches any letter. The pattern
+xx matches any doubled letter. The first pattern (from top to bottom of the list)
+which matches the current remainder/suffix pair controls the set of changes ap-
+plied.
+
+For each possible spelling of the remainder, the following steps are per-
+formed:
+
+1. Make the change.
+
+2. Recursively decompose the remainder.
+3. If a morph matches the right end of the remainder, check its spelling
+
+34
diff --git a/pages-txt/047.txt b/pages-txt/047.txt
new file mode 100644
index 0000000..77dccde
--- /dev/null
+++ b/pages-txt/047.txt
@@ -0,0 +1,49 @@
+Morphological analysis
+
+change code to see if it is compatible with the change (or lack of
+change). If the morph requires a spelling change, then it is rejected if
+a change was not made. If the morph forbids a change and a change
+
+was made, it is also rejected.
+Changes which have the notation “(+)” suppress the checking of the spelling
+change code. This allows the correct decomposition to be found for morphs which
+normally forbid spelling changes such as free (free+ing — freeing, but free+ed —
+freed). Changes which carry the notation “(*)” are made only for derivational suf-
+fixes.
+
+3.5 An example of a decomposition
+
+Figure 3-2 details the process by which DECOMP arrives at the decomposition
+scarcity —scarce+ity. Lines with the label “Decomp:” are produced during
+decomposition and document the recursive process. Lines which begin with a
+quoted string show the parameter states when a new level of recursion is entered.
+The quoted string itself is the current remainder to be covered and the information
+in brackets [ ] is the current state of the FSM as described in Section 3.4.3. The
+number in angle brackets < > is the current score. Lines which begin with
+“Matched” indicate that a morph match has been found. The morph spelling and
+type are given followed by the action taken as a result of the match.
+
+Lines labelled with “DECOMP:” show the data on the output stream from
+DECOMP to the next module. This information is described in Section 3.3 above
+and the lines are commented in italics.
+
+Initially, the remainder is the entire word scarcity and the cost is zero. The
+longest matching morph is city, which is a root and is legal in the rightmost posi-
+tion; hence decomposition proceeds to scar which is also a root. This yields the
+legal double-root covering scar+city with a total cost of 234. Next, decomposition
+backs up to the remainder scar to see if there are other possible coverings. Both
+attempts to cover scar fail, however, since the minimum possible cost for each
+
+covering would exceed the cost of the one already found.
+After the possibilities of scar have been exhausted, recursion backs up to
+
+scarcity to try the next longest morph after city. This morph is the derivational
+suffix ity and leaves a remainder scarce which is successfully covered by a root.
+This yields a new low cost; hence scarce+ity supersedes scar+city as the preferred
+covering. Spelling changes are attempted on scarce but these fail to yield a cover-
+ing.
+
+Finally, the recursion backs up to scarcity to try the shortest match y. With
+the spelling change to scarcite, DECOMP is able to match the root cite but since
+
+35
diff --git a/pages-txt/048.txt b/pages-txt/048.txt
new file mode 100644
index 0000000..f30bba4
--- /dev/null
+++ b/pages-txt/048.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+Table 3-1: Morph spelling change rules for vocalic suffixes
+
+Pattern Change Example
+ck+. none packing — pack+ing
+ck—c¢ picnicking — picnic+ing
+XX+1 none telling — tell+ing
+XX =X padding — pad+ing
+XX — Xxe silhouetting — silhouette+ing
+XX+, none yeller — yell+er
+XX —>X reddest —>red+est
+‘ e+e e—ee (+) freed —>free+ed
+2 e+ none dyeing —dye+ing
+| e+. none changeable — change+able
+i+i none skiing — ski+ing
+i+e ioy noisiest — noisy+est
+| i—ie (+) eeriest — eerie+est
+none efficient — effici+ent
+i+. ioy variation — vary+ation
+] none deviate — devi+ate
+| y+i none (+) flying — fly+ing
+y—oye(+) eying —eye+ing
+y+. none employer — employ+er
+o+ —e daring — dare+ing
+none showing — show+ing
+-y ®) harmonize — harmony+ize
+o+ —e observance — observe+ance
+none sender — send+er
+
+the cost of this covering cannot be as low as the one already found, DECOMP does
+not even bother to match the remaining scar. Since no more legal coverings are
+found, scarce+ity becomes the final decomposition.
+
+| 3.6 The lexicon
+
+3.6.1 Development and composition
+The present morph lexicon (Hunnicutt, 1976a) contains about 12,000 entries and is
+sufficient to analyze at least ten times that number of English words, giving the
+
+correct morph analysis, pronunciation and part(s) of speech.
+36
diff --git a/pages-txt/049.txt b/pages-txt/049.txt
new file mode 100644
index 0000000..72f5658
--- /dev/null
+++ b/pages-txt/049.txt
@@ -0,0 +1,66 @@
+Morphological analysis
+
+Decomp: "SCARCITY" [state = word <0> inflectional suffix] =>
+
+DECOMP ITY [DERIVATIONAL VOCALIC SUFFIX] : second morph
+DECOMP : *T-E~ (NOQUN)
+
+Decomp: Matched "CITY" (root) -- decompose remainder
+Decomp: "SCAR" [state = <101> root] =>
+Decomp: Matched "SCAR" (root) -- decompose remainder
+Decomp: "" [state = <234> root] =>
+Decomp: Matched start of word, final score = 234
+Decomp: Matched "CAR" (root) min. score = 268 -- too expen-
+sive!
+Decomp: Matched "AR" (derivational suffix) min. score = 234
+-—- too expensive!
+Decomp: Matched "ITY" (derivational suffix) -- decompose
+remainder
+Decomp: "SCARCE" [state = root <35> derivational suffix] =>
+Decomp: Matched "SCARCE" (root) -- decompose remainder
+Decomp: "t [state = <136> root] =>
+Decomp: Matched start of word, final score = 136
+Decomp: "SCARC" [state = root <35> derivational suffix] =>
+Decomp: Matched "ARC" (root) min. score = 170 -- too expen-
+sive!
+Decomp: "SCARCY" [state = root <35> derivational suffix] =>
+Decomp: Matched "Y" (derivational suffix) min. score = 136 --
+too expensive!
+Decomp: Matched "Y" (derivational suffix) -- decompose
+remainder
+Decomp: "SCARCITE" [state = root <35> derivational suffix] =>
+Decomp: Matched "CITE" (root) min. score = 170 -- too expen-
+sive!
+Decomp: "SCARCIT" [state = root <35> derivational suffix] =>
+Decomp: Matched "IT" (absolute) -- illegal!
+DECOMP: SCARCITY word spelling
+DECOMP : NOUN (NUMBER = SINGULAR)  part of speech and features
+DECOMP: => decomposition follows
+DECOMP : SCARCE [ROOT] : first morph spelling and type
+DECOMP : 1SKE*RS (ADJECTIVE) pronunciation and part of speech
+
+Figure 3-2: Decomposition of “scarcity”
+
+The morph lexicon was obtained by decomposing 50,406 distinct words
+found in a corpus of 1,014,232 words of running text into constituent morphs
+(Kucera and Francis, 1967). Beginning with a base of one-, two-, and three-letter
+words and a decomposition (analysis) algorithm, the lexicon was built up by suc-
+cessively adding to the base all n-letter words (starting with n=4) which either:
+
+1. did not decompose into words of less than n letters,
+
+2. decomposed into incorrect constituent morphs,
+3. had a pronunciation other than that obtained by concatenation of the
+
+pronunciations of the individual morphs, or
+4. had a part of speech which was not derivable from the part-of-speech
+
+sets of its constituent morphs.
+The first category includes n-letter words consisting of a single morph, words
+
+whose constituent morphs did not appear in the lexicon, and words in which an
+
+unrecognized spelling change prevented correct analysis.
+Although many spelling changes are recognized by the morphological
+
+37
diff --git a/pages-txt/050.txt b/pages-txt/050.txt
new file mode 100644
index 0000000..3d1dfd4
--- /dev/null
+++ b/pages-txt/050.txt
@@ -0,0 +1,45 @@
+From text to speech: The MITalk system
+
+analysis algorithm, some were considered to be either rare or difficult to imple-
+ment. Spelling changes which are particularly difficult to recognize are those in
+which a letter is either added or omitted. These changes frequently appear to have
+been made because of simplified pronunciations. In some cases, a vowel is
+dropped, as in administer/administration. In other cases, repeated consecutive
+sounds are omitted as in quietude (quiet+itude). Words in which letters are in-
+serted may contain an extra sound as in fixture (fix+ure) and armament
+(arm+ment), or simply an extra letter as in picnicked (picnic+ed) and stabilize
+(stable+ize) in which the spelling change allows retention of the original pronun-
+ciation.
+
+There are about 250 words in the morph lexicon which, if they were not lex-
+ical entries, would be analyzed by the algorithm into morphs other than those from
+which they are derived. These are the words mentioned in the second category
+above. The word colonize, for example, is not derived from colon; cobweb is not
+derived from cob; bargain is not derived from bar and gain.
+
+In some cases of multiple coverings, the selectional rules do not choose the
+correct analysis. For example, the word coppery may be analyzed as either
+cop+ery or as copper+y. In both cases, the morph types are the same: cop and
+copper are free roots, and ery and y are vocalic derivational suffixes. That is, the
+number of morphs and their types are exactly the same in the two possible
+analyses. When this situation arises, the selectional rules are constrained to choose
+the first analysis. Because the algorithm first searches for the longest morph from
+the right end of the word, cop+ery is chosen. This analysis is etymologically in-
+correct, and the polymorphemic word coppery is, therefore, a lexical entry.
+
+There are many polymorphemic words in English which differ in pronuncia-
+tion from that of their constituent morphs. For this reason, the third category
+above is rather large; it includes about 8 percent of the lexical entries. Some
+polymorphemic words differ in both pronunciation and stress, the two categories
+being highly interrelated.
+
+The part of speech of a word is very important in text-to-speech processing.
+It is used in determining a parse for a sentence which is then used in algorithms
+determining fundamental frequency and duration. DECOMP includes a part-of-
+speech processor which determines the part of speech of a word based on infor-
+mation associated with the component morphs in the lexicon. The procedure will
+be described in detail in the next chapter. If the part of speech of a word is not
+correctly predicted by its constituent morphs, then the entire word must be placed
+in the lexicon. For example, the suffix er is marked as forming adjectives, adverbs
+
+38
diff --git a/pages-txt/051.txt b/pages-txt/051.txt
new file mode 100644
index 0000000..c09b4ea
--- /dev/null
+++ b/pages-txt/051.txt
@@ -0,0 +1,14 @@
+Morphological analysis
+
+and nouns. There are some words, however, which end in er and are both nouns
+and verbs or are verbs only. Some examples are batter, checker, chatter, and
+flicker, which appear in the lexicon.
+
+Although many compounds have the part of speech of their rightmost morph,
+others do not. Such compounds must be included as lexical entries. A number of
+the compounds included for this reason are adjectives such as bygone, borderline,
+commonplace, and freehand. Others, such as buttonhole, homestead, and
+bottleneck, may be used as either nouns or verbs whereas their rightmost morph
+may be a noun but not a verb.
+
+39
diff --git a/pages-txt/052.txt b/pages-txt/052.txt
new file mode 100644
index 0000000..9f23cce
--- /dev/null
+++ b/pages-txt/052.txt
@@ -0,0 +1,44 @@
+4
+
+The phrase-level parser
+
+M
+
+4.1 Overview
+
+The parser for the text-to-speech system is designed to satisfy a unique set of con-
+straints. It must be able to handle arbitrary text quickly, but does not need to
+derive semantic information. Many parsers attempt to build a deep structure parse
+from the input sentence so that semantic information may be derived for such uses
+as question-answering systems. The text-to-speech parser supplies a surface struc-
+ture parse, providing information for algorithms which produce prosodic effects in
+the output speech. In addition, some clause boundaries are set according to rules
+described in Chapter 8. These phrase-level and clause-level structures provide
+much of the syntactic information needed by the present prosodic algorithms.
+
+It is well known that parsing systems which parse unrestricted text often
+produce numerous ambiguous or failed parses. Although it is always possible to
+choose arbitrarily among ambiguous parsings, a failed parse is unacceptable in the
+text-to-speech system. When one examines ambiguous results from full sentence-
+level parsers, one finds that the bottom level of nodes (i.e. the phrase nodes) are
+often invariant among the competing interpretations; the ambiguities arise from
+possible groupings of these nodes at the clause level, especially for parsers which
+build binary trees. One also finds that for many failed parses, much of the struc-
+ture at the phrase level has been correctly determined. The phrase-level parser
+takes advantage of this reliability, producing as many phrase nodes as possible for
+use by the MITalk prosodic component.
+
+The phrase-level parser uses comparatively few resources and runs in real-
+time. This is quite unusual for parsers which handle unrestricted text, but is neces-
+sary for a text-to-speech system. It would not be possible in such a practical sys-
+tem to allocate the resources needed for recursion in the grammar and for back-
+tracking control structures. Since extensive backtracking occurs above the phrase
+level for the most part, the combinatorial explosion associated with this strategy is
+
+avoided by restriction to phrase-level parsing.
+Phrase recognition is accomplished via an ATN (augmented transition
+
+network) interpreter (Woods, 1970) and the grammars for noun groups and verb
+groups. A “noun group” (NGR), as used in this grammar, means either a pronoun
+
+40
diff --git a/pages-txt/053.txt b/pages-txt/053.txt
new file mode 100644
index 0000000..69297a5
--- /dev/null
+++ b/pages-txt/053.txt
@@ -0,0 +1,50 @@
+The phrase-level parser
+
+(e.g. him or several), a pronoun with modification (e.g. almost anything green),
+an integer with or without modification (e.g. five or nearly a hundred thousand),
+a noun phrase up to and including the head noun (e.g. every third car or his own
+red and black car), or any of the above preceded by a preposition. A “verb
+group” (VGR) consists of a verb phrase without direct or indirect objects (e.g.
+could almost see, might not have been moving, had been very yellow). Another
+type of group, the “verbal” (VBL) is also recognized by the verb group network; it
+is either an infinitive phrase (e.g. to walk slowly, to be broken) or a participial
+phrase (e.g. walking slowly, have almost given).
+
+4.2 Input
+
+The input file from DECOMP has been described in Chapter 3. It contains the
+morph spelling, morph pronunciation, morph type, and parts of speech and features
+for each homograph of each morph in the analysis of the word. A parts-of-speech
+set for the entire word is also supplied.
+
+4.3 Output
+
+The output of the parser is a series of nodes representing either a parsed constituent
+(i.e. a phrase), or a word (or punctuation mark) which was not included in a phrase
+by the parser. Each node representing a phrase contains the words covered by that
+phrase in the order in which they appear in the text. The output file contains the
+following information: |
+
+1. For each node, the number of words covered by the node, the part of
+speech (type of constituent) of the node, and a property list is given.
+The property list is a set of attribute-value pairs.
+
+2. Each word is accompanied by its spelling and a part-of-speech set.
+Only one part of speech is given for those words covered by a node.
+
+3. For each part of speech, a property list is given.
+
+4.4 Parts of speech
+
+4.4.1 The standard parts of speech in the lexicon
+
+The following are the parts of speech of open class words and words which do not
+have any special syntactic or prosodic features. Those names in uppercase are the
+parts of speech, attributes, and attribute values as they are listed in the source ver-
+sion of the lexicon. A word itself may have any number of parts of speech. The
+designations TR and FL are abbreviations for “true” and “false”, respectively.
+
+NOUN (NUM SING) = singular noun
+NOUN (NUM PL) = plural noun
+
+41
diff --git a/pages-txt/054.txt b/pages-txt/054.txt
new file mode 100644
index 0000000..bba84bd
--- /dev/null
+++ b/pages-txt/054.txt
@@ -0,0 +1,40 @@
+From text to speech: The MITalk system
+
+VERB (INF TR) (PL TR) = infinitive form of verb
+VERB (SING TR) (PL TR) = past tense verb
+AD] = adjective
+ADV (VMOD TR) (ADJIMOD TR) =
+adverb which can modify either an adjective or a verb
+ADV (ADJMOD TR) = adverb which can modify an adjective
+PREP = preposition
+CONIJ = conjunction
+INTG = integer
+INTG (NUM SING) = one
+INTG (DEF FL) = integer which requires a (e.g. thousand)
+VERBING = present participle
+VERBEN = past participle
+TO=to
+SCONIJ = sentential conjunction (e.g. whether)
+CONTR = contraction (e.g. ’re)
+INTERJ = interjection (e.g. oh)
+
+4.4.2 Special parts of speech
+
+There are three internal parts of speech for punctuation. One is assigned to the
+single punctuation mark COMMA. The other two include a number of punctua-
+tion marks. Punctuation which is internal to a sentence [: ; () ’ and "] is assigned
+the part of speech IPM (internal punctuation mark). Punctuation which can be
+sentence-final is termed EPM (end punctuation mark).
+
+Some words in the lexicon are recognized as having special syntactic or
+prosodic features. The syntactic features and the distinctions among the various
+types of determiners follow from the grammar.
+
+First, consider the adverbs with property (MEAS TR). This indicates that
+these words can occur in such constructions as nearly a hundred ladders. The
+property (DETMOD TR) marks adverbs which can modify determiners such as
+almost any space. (NEGADV TR) designates an adverb which can appear with
+an indefinite article and a count noun as in hardly a salesperson. The property
+(NOT TR) on not and never signals certain prosodic effects.
+
+42
diff --git a/pages-txt/055.txt b/pages-txt/055.txt
new file mode 100644
index 0000000..51f68bd
--- /dev/null
+++ b/pages-txt/055.txt
@@ -0,0 +1,49 @@
+The phrase-level parser
+
+PDET stands for predeterminer and is the part of speech of a word which can
+occur before a determiner such as half his land or twice the money. The property
+(MEAS TR) applied to PDET indicates that a predeterminer can be present with a
+measure adverb as in almost all two hundred divers. (DEF TR) signifies that the
+determiner must be definite (all the pain is allowable; all a pain is incorrect),
+while (DEF FL) is the opposite (such the problem is incorrect, such a problem is
+allowable). (DET TR) marks a predeterminer which must be followed by some
+determiner. (OF FL) indicates that a predeterminer cannot be followed by of, as
+opposed to both of the gnomes. (TYPE A) marks predeterminers which cause
+certain quantifier-related prosodic effects. (QUANT A) designates the same usage
+for pronouns.
+
+The CASE and NUM attributes on the pronouns refer to case and number in
+the usual manner. The TYPE is listed for prosodic reasons. (DETMOD TR) in-
+dicates that a pronoun can be modified by a determiner-modifying adverb as in
+nearly everyone.
+
+There are four types of determiners. DETW stands for a wh-word determiner.
+DETQ signifies a quantified determiner. These are distinguished from the quan-
+tifiers (part of speech QUANT) by the fact that they may occur in the same noun
+group as an ordinal or integer as in every third Eskimo or any six infants. DET-
+MOD has the same meaning here as for pronouns. TYPE and QUANT again are
+prosodic indicators. Demonstratives (DEM) and articles (ART) are straightfor-
+
+ward.
+The ordinals (ORD) include next and last as well as the ordinal integers.
+
+Quarters, thirds, etc. are listed as ORD (NUM PL) because they can occur in
+constructions such as three quarters the money, or two thirds the money. The
+feature (DEF TR) on these words indicates that the preceding determiner must be
+definite. The quantifiers (QUANT) are usually marked for number agreement and
+for definiteness agreement with the preceding determiner.
+
+The modals (MOD) are marked with the attribute AUX which gives prosodic
+information. The property (TO TR) indicates that a modal can occur in construc-
+tions such as ought to deliver, while (TO BE) designates a word which must ap-
+pear with be and to as in was going to abscond. The rest of the BE and HAVE
+
+words have their usual meaning.
+
+4.5 The part-of-speech processor
+The part-of-speech processor is part of the DECOMP module in the text-to-speech
+
+system. It computes a part-of-speech set for each word in the input, given the
+morph decomposition and the parts of speech of the morphs. Itis based on Allen’s
+
+43
diff --git a/pages-txt/056.txt b/pages-txt/056.txt
new file mode 100644
index 0000000..5451d92
--- /dev/null
+++ b/pages-txt/056.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+“Preprocessor” (Allen, 1968). The current algorithm goes right-to-left across the
+morphs and uses the part of speech of the rightmost morph for a compound, as
+well as for cases where there is a suffix. This is justified by two facts:
+
+1. suffixes (especially the rightmost suffix since it is outermost in the
+“nesting” of affixes) determine the part of speech of a word with
+regularity (e.g. ...ness is a NOUN);
+
+2. the part of speech of compounds is very idiosyncratic (in fact, it is
+usually determined by semantic rather than syntactic information)
+and the best heuristic is to use the part-of-speech set of the rightmost
+
+root.
+
+A complete description of the part-of-speech processor is given in Appendix
+A. First, the processor checks to see if there was a decomposition. If there is
+none, then it calls a routine which assigns the part-of-speech set (NOUN (NUM
+SING), VERB (PL TR) (INF TR), ADJ) unless the word ends in ’S in which case
+the part-of-speech set is (NOUN (POSS TR), NOUN (NUM SING) (CONTR
+TR)). Next, the program determines whether the last morph in the decomposition
+is a suffix. If it is not, then the program checks for the part-of-speech determining
+prefixes. The prefixes EM, EN, and BE indicate that a word is a VERB, while A
+gives the part-of-speech set (ADJ, ADV). (Suffixes have priority over these, as in
+befuddlement.) If none of these are present, then the processor assigns the part-
+of-speech set of the last morph in the decomposition.
+
+The rest of the processor is essentially a dispatch on the last suffix. In many
+cases, the next to last morph’s part of speech must also be examined. If the last
+morph is the suffix ING, the part of speech is specified as VERBING, while ED
+indicates that the part-of-speech set is (VERBEN, VERB (SING TR) (PL TR)). If
+the last morph is S or ES, a number of checks must be made. If the next to last
+morph is not a suffix and there is a verb-producing prefix, then the part of speech
+is VERB (SING TR), as in entitles. If the penultimate morph has the part of
+speech VERB, then the same part of speech is assigned. If the previous morph is a
+NOUN, ADJ, or INTG or is ER or ING, then the part of speech NOUN (NUM
+PL) is added to the set. If the next to last morph is an ORD, then the part of speech
+is also ORD (NUM PL). Finally, if there is still no part of speech, the processor
+assigns NOUN (NUM PL), as in the whys and wherefores.
+
+If the last suffix is ER, then three checks are made. If the next to last morph
+has the part of speech ADV, then the word is a comparative adverb; if it is an AD]J,
+then the word is a comparative adjective. If it is a NOUN or a VERB, then the
+word is a singular NOUN, as in worker. If the last morph is S’, then the word’s
+part of speech is NOUN with the property (POSS TR).
+
+44
diff --git a/pages-txt/057.txt b/pages-txt/057.txt
new file mode 100644
index 0000000..9f6fa31
--- /dev/null
+++ b/pages-txt/057.txt
@@ -0,0 +1,45 @@
+The phrase-level parser
+
+For a last suffix of ’S, three checks are performed. If the previous morph is a
+NOUN, then the part-of-speech set is (NOUN (POSS TR), NOUN .... (CONTR
+TR)), where “....” are the features that the previous morph had (e.g. (NUM PL)). If
+the next to last morph is a PRN, then the part of speech is PRN with the previous
+morph’s features and the additional property (CONTR TR). If that morph also has
+the property (PRNADJ TR), which includes the pronouns ending in body, one, and
+thing, then the part-of-speech set also includes PRN with the prior morph’s fea-
+tures and the property (CASE POSS), as in anybody’s.
+
+The last three cases of the dispatch deal with contractions. If the last morph is
+N°T, first the program checks if the previous morph is NEED. The part of speech
+of needn’t is MOD, and the features are (AUX A) and (NOT TR). If the next to
+last morph has the part of speech BE, HAVE, or MOD, the processor just adds the
+property (NOT TR). If the last morph is *VE and the previous morph is a modal,
+then the part of speech is the same as the previous morph with the additional
+property (CONTR TR), as in must’ve. Finally, if the last morph is one of the verb
+contractions *VE, °D, ’LL, and ’RE, the processor checks if the prior morph is the
+plural morph S. (The kids’ve been busy. The boys’ll go.) If so, the word’s part
+of speech is NOUN with the features (NUM PL) and (CONTR TR). Otherwise, if
+the previous morph is a NOUN or PRN, the property (CONTR TR) is added to the
+feature set.
+
+If the last suffix is none of the above, then the part-of-speech set of the word
+is the part-of-speech set of that morph. If a word still has no part of speech (e.g.
+only’s), then the routine which assigns “default” parts of speech is called, as in the
+case of no decomposition.
+
+4.6 The parser algorithm
+
+4.6.1 Parsing strategy
+The parser reads information from DECOMP on the words in a text one sentence
+at a time. It then attempts to find phrases in the sentence. The operation of the
+parsing logic can be thought of as having two levels. The global level reflects the
+parsing strategy, which has been found to give the best phrases. It is based on
+three empirical facts:
+1. There are many more noun groups (and prepositional phrases) than
+verb groups in running text.
+2. The initial portions of noun groups are easier to detect than verb
+groups. Verb groups frequently begin with the verb itself which of-
+
+ten has both NOUN and VERB in its possible part-of-speech set.
+
+45
diff --git a/pages-txt/058.txt b/pages-txt/058.txt
new file mode 100644
index 0000000..b5d0fdb
--- /dev/null
+++ b/pages-txt/058.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+3.Nouns are very often compounded into classifier strings (e.g.
+
+cathode ray tube cleaning fluid).
+The local level merely interprets the ATN grammar.
+
+The global parsing strategy proceeds as follows: it looks for the longest noun
+group (a noun phrase up to the head noun, possibly including an initial preposition)
+that it can find beginning with the first word in the sentence. If it locates one, then
+a “node” representing that constituent is constructed, the “current word” pointer is
+advanced to the word after that constituent and the process begins again at that
+point. If no noun group is found, the parsing logic attempts to find the longest
+verb group starting at the word pointer. If it is successful, then a “node” is built,
+the pointer is incremented, and the process begins again. If neither type of group
+can be found at a certain point in the sentence, no node is created and the pointer is
+simply moved to the next word in the sentence and the process begins again.
+
+At the local level, the parser uses the ATN to find a constituent. There are
+two pointers, one pointing to the word in the sentence currently being examined
+and one pointing to the current state in the net, which begins at the initial state for
+noun groups or verb groups. The parser tries each arc leading from the current
+state in the order in which they appear in the net. This net is shown in Tables 4-1
+(noun group) and 4-2 (verb group).
+
+Testing an arc is done as follows:
+
+1. If the arc label is JUMP or POP, then the exit routine associated with
+
+that arc is tested. If it is successful, then for JUMP the state pointer
+is advanced to the destination state (the word pointer is not
+incremented), and the process begins again at that state. For POP, a
+node is built if the popped constituent is longer than any found so far,
+and the process continues with the next arc leaving the state. (That
+is, parsing is exhaustive.) If the tests are unsuccessful, then the par-
+ser simply checks the next arc leaving the state.
+
+2. If the arc label is a part of speech and the current word does not have
+this part of speech, the parser continues with the next arc. If the cur-
+rent word does have this part of speech, the exit routine is tested. If
+successful, the word pointer is incremented and the state pointer is
+advanced to the destination state of the arc. If it fails, the next arc is
+attempted. If the parser is to test the next arc for some state and no
+arcs remain, then the state pointer is reset to the state from which the
+arc led which brought the process to the current state, and the process
+begins again with the next arc in the new state.
+
+46
diff --git a/pages-txt/059.txt b/pages-txt/059.txt
new file mode 100644
index 0000000..9767d9e
--- /dev/null
+++ b/pages-txt/059.txt
@@ -0,0 +1,154 @@
+NG
+
+NG-Ving
+
+NG-Adv
+
+NG-Than
+NG-ThanA
+
+NG-Pdet
+
+NG-Pm
+
+NG-Times
+
+NG-Det
+
+NG-Quant
+
+VERBING NG-Ving IngVbl
+TO VG-Inf Vbl
+
+PREP NG-Adv PP
+
+TO NG-Adv PP
+
+JUMP NG-Adv NG
+
+JUMP NG-Adjl IngAdj
+JUMP NG-N1 IngNoun
+POP NG-Ving Vbl
+
+ADV NG-Pdet NA
+QUANT NG-Than MorLes
+ADJ NG-Adj Adj
+
+JUMP NG-Pdet OK
+
+SCONJ NG-ThanA Than
+
+ART NG-INtg ThanA
+JUMP NG-Intg OK
+
+INTG NG-Times FracNum
+PRN NG-Own Pmps
+PDET NG-Det PD
+
+DETW NG-Ord DW
+DETQ NG-Ord DQ
+
+JUMP NG-Det OK
+
+PRN NG-Pm Pr
+
+ADV NG-Pm PmAdv
+ADJ NG-PA PA
+PREP NG-Pdet PmOf
+POP NG-Pm PrPop
+
+ORD NG-Det FracDen
+NOUN NG-Det Times
+
+NOUN NG-Own Poss
+PRN NG-Own PPoss
+DEM NG-Ord Dem
+ART NG-Ord Art
+
+PREP NG-Pdet PdetOf
+JUMP NG-Ord NeedDet
+
+QUANT NG-Adj Quant
+PREP NG-Pdet DetOf
+JUMP NG-Intg OK
+
+Figure 4-1: Noun group ATN listing
+
+NG-PA
+NG-Ord
+
+NG-Own
+
+NG-Intg
+
+NG-Intgl
+
+NG-IntOrd
+
+NG-Frac
+
+NG-Denom
+NG-Adj
+
+NG-Adjl
+
+NG-N
+
+NG-N1
+
+The phrase-level parser
+
+POP NG-PA OK
+
+ORD NG-Quant Ord
+JUMP NG-Quant OfTest
+ADJ NG-Intg IntMod
+
+ADJ NG-Ord Own
+DETQ NG-Adj Every
+JUMP NG-Ord OK
+POP NG-Own PopP
+
+INTG NG-Intgl Intg
+JUMP NG-Adj NoMeas
+
+INTG NG-Intgl IntA
+CONIJ NG-Frac And
+JUMP NG-IntOrd OK
+
+ORD NG-Adj IntOrd
+JUMP NG-Adj OK
+
+INTG NG-Denom Numer
+ART NG-Denom A
+
+ORD NG-Adj Denom
+
+ADV NG-Adj AdvAdj
+ADJ NG-Adjl Adj
+
+VERBEN NG-Adjl Adj
+VERBING NG-Adj1 VngAdj
+PREP NG-Pdet QuantOf
+JUMP NG-N NeedAdj
+
+POP NG-Adj IntPop
+
+CONJ NG-Adj AConj
+COMMA NG-Adj AComma
+JUMP NG-Adj OK
+
+NOUN NG-Own PossN
+NOUN NG-N1 Noun
+VERBING NG-N1 ConjVing
+INTG NG-N1 Nintg
+
+PREP NG-Adv NounOf
+
+POP NG-N PopN
+
+CONJ NG-N NConj
+COMMA NG-N NComma
+JUMP NG-N OK
+
+47
diff --git a/pages-txt/060.txt b/pages-txt/060.txt
new file mode 100644
index 0000000..0154cfe
--- /dev/null
+++ b/pages-txt/060.txt
@@ -0,0 +1,58 @@
+From text to speech: The MITalk system
+
+VG ADV VG Adv VG-Have ADYV VG-Have Adv
+MOD VG-Inf Mod MOD VG-To Got
+MOD VG-To ModTo BEEN VG-Part Been
+HAVE VG-Have Have JUMP VG-Part Nolng
+MOD VG-Have ModCntr JUMP VG-To OK
+BE VG-Part Be VGPat  MOD VG-To BeMod
+VERB VG-Pop Verb .
+ADYV VG-Part CopAvj
+JUMP VG-Inf Vbl .
+JUMP VG-Have Vbl BEING VG-Part Being
+VERBING VG-Pop Ving
+VG-Inf ADYV VG-Inf Adv VERBEN VG-Pop En
+MOD VG-To Get JUMP VG-Cop Cop
+HAVE VG-Have HavInf VG-Cop  ADI VG-Pop OK
+BE VG-Part BInf JUMP VG-Pop CopNoAj
+VERB VG-Pop VInf A
+
+VG-Pop  ADV VG-Pop NoPrep
+
+VG-T ADV VG-
+G-To VG-To Adv POP VG-Pop PopV
+
+TO VG-Inf OK
+
+Figure 4-2: Verb group ATN listing
+
+4.6.2 The verb group grammar
+The verb group grammar appears in Figure 4-3. This is the simpler of the phrase
+grammars. It has fewer arcs, fewer states and alternate paths, fewer exit routines,
+and only two POP arcs. Its auxiliary verb structure is very well-defined. Also,
+there are no multiple parts of speech for one word, causing two paths to be inves-
+tigated.
+
+Some examples follow of basic verb groups which successfully traverse this
+
+net:
+
+sometimes runs usually would have been jumping
+is being run have to go
+
+would have been seen about to be done
+
+This basic grammar has been extended to include certain modal arcs used in
+spoken English. Some examples of these verb groups are:
+
+get to run does get used
+get to go can’t possibly get to see
+
+Particles have not as yet been treated. At present, in a simple sentence such
+as He picked up the books, three noun groups are found: he, picked up, and the
+books. In the phrase picked up, picked is assumed to be a past participle being
+used as an adjective and up is assumed to be the noun, as in the colloquial expres-
+sion It’s a real up. In the sentence He ran out of the room, ran is considered a
+verb group, but out is considered a noun group (as in How many outs does the
+
+48
diff --git a/pages-txt/061.txt b/pages-txt/061.txt
new file mode 100644
index 0000000..5a78253
--- /dev/null
+++ b/pages-txt/061.txt
@@ -0,0 +1,37 @@
+The phrase-level parser
+
+Figure 4-3: ATN diagram for verb groups
+
+team have?), and of is included in the prepositional phrase of the room. It might
+be possible to correctly parse some of these particle constructions using a feature
+on the verb. However, such a feature also allows for incorrect recognition of a
+preposition as a particle.
+
+4.6.3 The noun group grammar
+
+Figure 4-4 contains the noun group grammar. This is the more complex phrase
+grammar. It has many arcs and branches, and many exit tests. There are many
+possible sequences to follow in the net; the examples below illustrate some pos-
+sible paths and some which are correctly blocked (these are starred).
+
+almost any book *book
+
+about two fifths the book *almost him
+
+the many books *a many books
+both these women *both this women
+every other book *every other books
+any two books *some few books
+every few books *a few book
+
+that few books *that much book
+that many books *his much money
+a woman’s many books *a book women
+
+The noun group grammar contains a number of optional arcs or “sidetracks”.
+Examples of cases in which these arcs would be traversed are listed below:
+
+three quarters his own shoes
+more than five shoes three and a half
+
+49
diff --git a/pages-txt/062.txt b/pages-txt/062.txt
new file mode 100644
index 0000000..c526ac9
--- /dev/null
+++ b/pages-txt/062.txt
@@ -0,0 +1,19 @@
+From text to speech: The MITalk system
+
+VERBING
+
+VERBING
+ADJ VERBEN
+
+JUMP CONJ
+COMMA
+
+VERBING
+NOUN INTG
+
+JUMP CONJ
+COMMA POP
+
+Figure 4-4: ATN diagram for noun groups
+
+50
diff --git a/pages-txt/063.txt b/pages-txt/063.txt
new file mode 100644
index 0000000..14474b7
--- /dev/null
+++ b/pages-txt/063.txt
@@ -0,0 +1,99 @@
+The phrase-level parser
+
+red, white and blue three and twenty blackbirds
+simple pronoun noun groups, €.g., he.
+
+4.7 Some examples
+
+Figure 4-5 shows an example of the phrase-level parse produced by the parser.
+The text is a portion of a computer-taught course at Stanford. The paragraphs
+parsed here are taken from the end of a section about predicate calculus.
+
+PARSER: NOUN GROUP: MOST OF THE EXERCISES
+
+PARSER: VERB GROUP: ARE
+
+PARSER: NOUN GROUP: TRANSLATIONS
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: VERB GROUP: THERE ARE
+
+PARSER: NOUN GROUP: SEVERAL IMPORTANT CHANGES
+
+PARSER: PREPOSITIONAL PHRASE: IN THE WAY
+
+PARSER: NOUN GROUP: THE QUANTIFIER RULES
+
+PARSER: VERB GROUP: WILL WORK
+
+PARSER: PREPOSITIONAL PHRASE: FOR THE REMAINDER OF THE
+COURSE
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: FIRST
+
+PARSER: UNCLASSIFIED: ,
+
+PARSER: NOUN GROUP: THE PROGRAM
+
+PARSER: VERB GROUP: WILL INFORM
+
+PARSER: NOUN GROUP: YOQOU
+
+PARSER: UNCLASSIFIED: IMMEDIATELY
+
+PARSER: UNCLASSIFIED: IF
+
+PARSER: NOUN GROUP: A QUANTIFIER INFERENCE
+
+PARSER: VERB GROUP: VIOLATES
+
+PARSER: NOUN GROUP: ANY OF THE RESTRICTIONS
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: NOUN GROUP: THAT WAY
+
+PARSER: NOUN GROUP: AN OVERSIGHT
+
+PARSER: VERB GROUP: WON’T COST
+
+PARSER: NOUN GROUP: YOQU
+
+PARSER: NOUN GROUP: A LOT OF WORK
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: NOUN GROUP: YOU
+
+PARSER: VERB GROUP: CAN'T USE
+
+PARSER: NOUN GROUP: AMBIGUOUS NAMES
+
+PARSER: PREPOSITIONAL PHRASE: WITH A SHARP
+
+PARSER: PREPOSITIONAL PHRASE: IN THEM
+
+PARSER: UNCLASSIFIED: ANYMORE
+
+PARSER: UNCLASSIFIED: .’
+
+PARSER: UNCLASSIFIED: .
+
+PARSER: <EOF>
+
+Figure 4-5: Example of PARSER operation
+
+51
diff --git a/pages-txt/064.txt b/pages-txt/064.txt
new file mode 100644
index 0000000..7d661db
--- /dev/null
+++ b/pages-txt/064.txt
@@ -0,0 +1,45 @@
+S
+
+Morphophonemics and stress adjustment
+
+5.1 Overview
+
+It is not always possible to simply concatenate the pronunciations of the con-
+stituent morphs of a word to get its pronunciation. There are sometimes changes in
+pronunciation at morph boundaries. Module SOUNDI1 checks for contexts in
+which such changes occur, and changes the pronunciation. It also adjusts the lex-
+ical stress for compounds and for words having suffixes requiring special stress
+rules. SOUNDI1 also performs letter-to-sound conversion for words which were
+not segmented by DECOMP (this function will be described in the following
+chapter). It accepts as input all the word and morph information given by
+DECOMP and the additional phrase part-of-speech information produced by
+PARSER. Output is a set of phonetic segment labels for each word along with the
+phrase information from PARSER.
+
+5.2 Input
+Input to SOUNDI is the output stream from PARSER. The format of this stream
+has been described in Chapter 4. It contains morph pronunciation information
+
+from DECOMP and phrase and part-of-speech information from PARSER.
+
+5.3 Output
+
+The output stream from SOUNDI consists of a string of phonetic segment labels,
+stress marks, and syllable and morph boundaries for each word. At the end of each
+sentence, the phrase information for that sentence is placed in the output stream
+(this is simply a duplicate of the phrase information from PARSER).
+
+5.4 Morphophonemic rules
+
+The pronunciation for each word which has been segmented by DECOMP is con-
+structed by catenating the pronunciations of its component morphs. The following
+rules are applied to modify the morph pronunciations when necessary.
+
+5.4.1 Plurals, possessives, and contractions with “is”
+
+Words which end in a fricative or affricate close in place of articulation to ss and
+232, i.e., the set of segments Ss, 2z, SH, zH, CH, and JJ, form their plurals and
+possessives by the concatenation of the segment string IB 22, or, in its vowel-
+reduced form, Ix 2z2z (e.g. busses, churches, garages, marsh’s). After other
+
+52
diff --git a/pages-txt/065.txt b/pages-txt/065.txt
new file mode 100644
index 0000000..6c737e2
--- /dev/null
+++ b/pages-txt/065.txt
@@ -0,0 +1,49 @@
+Morphophonemics and stress adjustment
+
+voiced segments, the plural and possessive morphemes are realized as zz (e.g.
+dogs, potatoes). After other unvoiced consonants, it is pronounced ss (e.g.
+backs, cat’s). Nouns and pronouns contracted with the verb is follow the same
+rules as possessives (e.g. the dog is —the dog’s, the cat is—the cat’s). It is in-
+teresting to note that since the plural or possessive morpheme and the word is have
+the same pronunciation after the set of phonetic segments given special treatment
+above. No contraction is made with is, that is, one would not write
+The church’s across the street
+to mean
+
+The church is across the street.
+
+Presumably, however, someone who does not read or write will not be able to tell
+which form was being used.
+
+5.4.2 Past participles
+
+The analysis for past tense forms is similar. After the segments TT and DD, the
+extra vowel separation is provided to give the pronunciation IH DD or IX DD (e.g.
+mended, minted). After other voiced segments, the pronunciation DD is chosen
+(e.g. whispered, rowed) and after other unvoiced consonants, the pronunciation
+TT is chosen (e.g. hushed).
+
+5.4.3 The prefixes pre, re, and de
+
+Although it is not possible to construct a rule covering the correct pronunciation of
+these prefixes in all cases, the rule which was chosen is frequently correct. Before
+free morphs such as gain, the e is given the long vowel sound of 1Y (e.g. regain).
+The short vowel sound 11 is assigned before bound morphs requiring a prefix (e.g.
+prefer). This rule should only apply if there is no stress on the prefix; any word in
+which the prefix is stressed should appear in the lexicon as a separate entry (e.g.
+preference).
+
+5.4.4 Palatalization before suffixes ion and ure
+
+The suffixes ion and ure both cue palatalization of alveolar segments preceding
+them. The affricates they become are dependent upon the segment preceding the
+alveolar consonant. In addition, a change in the pronunciation of the suffix accom-
+panies this palatalization. In this module, these changes are accomplished by
+recognition of letter contexts.
+
+Preceding the suffix ion, the letter t is pronounced cH after n or s and SH
+otherwise (e.g. retention, congestion, completion). The letter s is given the
+pronunciation SH after 1 or s (e.g. emulsion, compression), the pronunciation ZzH
+after r or a vowel (e.g. subversion, adhesion), and cH after n (e.g. suspension).
+
+53
diff --git a/pages-txt/066.txt b/pages-txt/066.txt
new file mode 100644
index 0000000..36be579
--- /dev/null
+++ b/pages-txt/066.txt
@@ -0,0 +1,50 @@
+From text to speech: The MITalk system
+
+There are a few words ending in xion, such as complexion, in which the pronun-
+ciation KK SS, asin complex, is changed to XK SH.
+
+The suffix ion itself, which is pronounced IY - AX NN in some contexts
+(e.g. centurion, accordion), loses the pronunciation of the first vowel, which ap-
+pears to be absorbed into the palatalized consonant, and is pronounced Ax NN after
+the affricates (all those examples given above). The segment IY becomes more of
+a glide after 1 and n, and is given the pronunciation YY AX NN in such words as
+rebellion and dominion.
+
+The palatalization rules for the suffix ure are slightly less dependent upon
+context. The letters t and d are pronounced CH and JJ, respectively (e.g. vesture,
+verdure). The letter s follows the same rules as when it precedes the suffix ion,
+i.e., itis sH afterlands (e.g. pressure), zH after r or a vowel (e.g. exposure), and
+cH after n, as in tonsure. A rule is also provided for x preceding ure, changing XX
+SS to KK SH, as in flexure.
+
+5.4.5 The suffix ic
+
+Preceding the front vowels represented in the orthography by e, i, and y, the suffix
+ic is changed in pronunciation from IH KX, which contains the velar Kk, to the
+more fronted alveolar-containing TH SsS (e.g. electricity).
+
+5.5 Stress modification rules
+
+A compound stress rule is applied to words decomposed into more than one root or
+bound root. The primary stress (or 1-stress) is retained on the leftmost root.
+Primary stress on other roots is reduced to secondary (or 2-stress) as in houseboat
+(1-stress on house and 2-stress on boat).
+
+Suffixes which shift the primary stress in a word, such as ee, eer, esce, and
+ation are entered in the morph lexicon with primary stress. The stress on any root
+to which they attach is reduced to secondary (e.g. trainee, auctioneer). This is
+implemented by checking for primary stress on the leftmost derivational suffix and
+reducing the root stress when the suffix stress is found.
+
+5.6 An example |
+The input files from PARSER and the output file resulting from SOUNDI1’s opera-
+tion for the sentence:
+Priscilla noted the houseboat’s reflection.
+
+are shown in Figure 5-1. The symbols * and - denote morph and syllable boun-
+daries, respectively.
+
+The word Priscilla is not found in the lexicon. This fact is noted in the
+PARSER output file by the lack of => notation after its spelling. The pronuncia-
+
+54
diff --git a/pages-txt/067.txt b/pages-txt/067.txt
new file mode 100644
index 0000000..84c08e7
--- /dev/null
+++ b/pages-txt/067.txt
@@ -0,0 +1,95 @@
+Morphophonemics and stress adjustment
+
+PARSER: NOUN GROUP (NUMBER = SINGULAR)
+
+PARSER: PRISCILLA
+
+PARSER: NOUN (NUMBER = SINGULAR)
+
+PARSER: VERB GROUP (SINGULAR = TRUE) (PLURAL = TRUE)
+PARSER: NOTED
+
+PARSER: VERB (SINGULAR = TRUE) (PLURAL = TRUE)
+PARSER: =>
+
+PARSER: NOTE [ROOT] :
+
+PARSER: 1NOAT (VERB, NOUN)
+
+PARSER: ED [INFLECTIONAL VOCALIC SUFFIX]
+PARSER: D (VERB, PAST PARTICIPLE)
+PARSER: NOUN GROUP (NUMBER = SINGULAR)
+
+PARSER: THE
+
+PARSER: ARTICLE (DEFINITE = TRUE)
+
+PARSER: =>
+
+PARSER: THE [ABSOLUTE] :
+
+PARSER: 1TH_* (ARTICLE)
+
+PARSER: HOUSEBOAT'’S
+
+PARSER: NOUN (POSSESSIVE = TRUE)
+
+PARSER: =>
+
+PARSER: HOUSE [ROOT] :
+
+PARSER: 1HAU’S (NOUN)
+
+PARSER: 1HAU’Z (VERB)
+
+PARSER: BOAT [ROOT] :
+
+PARSER: 1BOMT (NQUN)
+
+PARSER: S [INFLECTIONAL NONVOCALIC SUFFIX]
+PARSER: IZ (CONTRACTION, NOUN)
+
+PARSER: REFLECTION
+
+PARSER: NOUN (NUMBER = SINGULAR)
+
+PARSER: =>
+
+PARSER: REFLECT [ROOT] :
+
+PARSER: RI-1FLEKT (VERB)
+
+PARSER: ION (DERIVATIONAL VOCALIC SUFFIX] :
+PARSER: *N (NOUN)
+
+PARSER: UNCLASSIFIED
+
+PARSER: .
+
+PARSER: END PUNCTUATION MARK
+
+SOUNDl: PP RR "IH - SS 'IH LL - AX
+SOUND1l: NN ‘OW TT * - IH DD
+
+SOUND1l: DH ‘AH
+
+SOUND1l: HH 'AW SS * - BB "OW TT * SS
+SOUND1: RR IH - FF LL 'EH KK SH * - AX NN
+SOUND1: .
+
+Figure 5-1: Input to and output from SOUNDI1
+
+tion shown in the output is a result of the letter-to-sound and lexical stress applica-
+tion which will be described in the next chapter.
+
+In the lexicon, the pronunciation of the “past” morpheme is given as DD.
+This is the pronunciation used following all voiced segments except pD. The first
+morph in noted is note. Its last segment is TT which is one of the two segments
+requiring the special pronunciation 1H DD or 1X DD. In the output of SOUNDI,
+the DD has been converted to IH DD.
+
+Observing the lexical information for the word house, we see that there are
+two homographs. The pronunciation given first, ends in unvoiced ss: this is the
+nominal pronunciation. The second pronunciation, on the following line, ends in
+
+35
diff --git a/pages-txt/068.txt b/pages-txt/068.txt
new file mode 100644
index 0000000..aff9194
--- /dev/null
+++ b/pages-txt/068.txt
@@ -0,0 +1,26 @@
+From text to speech: The MITalk system
+
+the voiced counterpart, zz. Like any other word containing a morph with more
+than one homograph, this compound has been inspected in DECOMP to enssure
+that the homograph listed first in the output file is the homograph having the same
+part of speech as the word. Thus, SOUNDI, choosing the first pronunciation for
+all morphs, picks the correct nominal pronunciation.
+
+The lexical pronunciation of ’s is 18 zz. In this case, the pronunciation ss
+must be substituted since the preceding segment is an unvoiced TT.
+
+A third change to houseboat’s is due to the operation of the compound stress
+rule. This rule assigns primary stress to the leftmost root in a word containing
+more than one root, reducing the stress on the rightmost root. Thus, the stress on
+boat is reduced to 2-stress.
+
+Two other changes are demonstrated in the word reflection. Lexical pronun-
+ciation of the prefix re is RR 1Y. Because flect is a bound root and re carries no
+stress, the pronunciation of re is changed to RR IH. One of the two suffixes that
+cues palatalization is also encountered in this word. The segment TT at the end of
+the pronunciation of flect is a member of the set of alveolars that palatalizes in this
+context. In the output file, TT has been changed to sH. It is unnecessary to
+change the pronunciation of ion since it is entered in the lexicon in its desired
+form.
+
+56
diff --git a/pages-txt/069.txt b/pages-txt/069.txt
new file mode 100644
index 0000000..5b6b5db
--- /dev/null
+++ b/pages-txt/069.txt
@@ -0,0 +1,43 @@
+6
+
+Letter-to-sound and lexical stress
+
+6.1 Overview
+
+In order to convert unrestricted text to speech, it is necessary to have a scheme
+which stipulates a pronunciation for words not analyzable by the lexical analysis
+algorithm. This comprehensiveness is provided by the letter-to-sound section of
+SOUNDI1. The letter strings which it receives are converted into stressed phonetic
+segment label strings (hereafter referred to as segment strings) using two sets of
+ordered phonological rules (Hunnicutt, 1976b). The first set to be applied converts
+letters to phonetic segments, first stripping affixes, then converting consonants,
+and finally converting vowels and affixes. The second set applies an ordered set of
+rules which determine the stress contour of the segment string.
+
+These rules were developed by a process of extensive statistical analysis of
+English words. The form of the rules reflects the fact that pronunciation of vowels
+and vowel digraphs, consonants and consonant clusters, and prefixes and suffixes
+is highly dependent upon context. The method of ordering rules allows converted
+strings which are highly dependable to be used as context for those requiring a
+more complex framework. Detailed studies of allowable suffix combinations, and
+the effect of suffixation on stress and vowel quality, have also provided for more
+reliable results.
+
+This component is integral to SOUNDI1 described in the previous chapter and
+processes words which were not segmented by DECOMP. Input and output for-
+mats are described in that chapter.
+
+6.2 Letter-to-sound
+
+6.2.1 Operation
+
+The conversion of a letter string to a phonetic segment string in the letter-to-sound
+program proceeds in three stages. In the first stage, prefixes and suffixes are
+detected. Such affixes appear in the list of phonological rules. Each is classified
+
+according to:
+1. its possible parts of speech,
+2. the possible parts of speech of a suffix preceding it,
+3. its restriction or lack of restriction to word-final position, and
+
+57
diff --git a/pages-txt/070.txt b/pages-txt/070.txt
new file mode 100644
index 0000000..debe722
--- /dev/null
+++ b/pages-txt/070.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+4. its ability to change a preceding y to i or to cause the omission of a
+preceding e.
+Prefixes are given no further specification.
+
+Detection of suffixes proceeds in a right-to-left, longest-match-first fashion.
+When no additional suffixes can be detected, or when a possible suffix is judged
+syntactically incompatible with its right-adjacent suffix by a part-of-speech test
+using the first two classifications above, the process is terminated. Finally,
+prefixes are detected left-to-right, also by longest match first. If at any time the
+removal of an affix would leave no letter in the remainder of the word, the affix is
+not removed.
+
+An example of affix detection and analysis is furnished in Figure 6-1 below.
+Two possible suffixes, ish and ing, are detected. The suffix ing terminates either a
+noun or a verb, and is constrained to follow either a noun-forming or a verb-
+forming suffix. The suffix ish, however, is adjectival. Therefore, this possible
+analysis is rejected, and the correct analysis is chosen. If the string ish had been
+selected as a suffix, the root to which it attaches would have been assumed to end
+in e, and would have been pronounced fine.
+
+finishing
+fin+ish+ing possible suffix analysis
+ing: (a) nominal or verbal suffix
+(b) follows nominal or verbal suffix
+ish: (a) adjectival suffix
+parts of speech not compatible
+(b) follows nominal or adjectival suffix
+finish+ing correct analysis
+
+Figure 6-1: Suffix detection in the word finishing
+
+6.2.2 Domain of application
+The domain of application of the second stage rules excludes any previously
+recognized affixes and is assumed to be a single-root morph. This stage is in-
+tended primarily for consonant rules and proceeds from the left of the string to the
+right. Extending the domain to the whole letter string once again for the third
+stage, a phonemic representation is given to affixes, vowels, and vowel digraphs.
+Phonemic representations are produced by a set of ordered rules which con-
+vert a letter string to a phonetic segment string in a given context. Both left and
+right contexts are permitted in the expression of a rule. Any one context may be
+composed of either letters or segments. Combination of these possibilities for both
+left and right contexts allows for four possible context types.
+
+58
diff --git a/pages-txt/071.txt b/pages-txt/071.txt
new file mode 100644
index 0000000..2e914d6
--- /dev/null
+++ b/pages-txt/071.txt
@@ -0,0 +1,46 @@
+Letter-to-sound and lexical stress
+
+6.2.3 Rule ordering
+
+The method of ordering rules allows converted strings which are highly depend-
+able to be used as context for those requiring a more complex framework. Because
+the pronunciation of consonants is least dependent upon context, phonological
+rules for consonants are applied first, i.e., in the second stage. Rules for vowels
+and affixes, requiring more specification of environment, are applied in the third
+and final stage. With the benefit of a previously converted consonant framework
+and the option of including as context any segment to the left of a string under con-
+sideration, the task of converting vowels and affixes is simplified.
+
+Within the two sets of rules for conversion of consonants and vowels, order-
+ing proceeds from longer strings to shorter strings and, for each string, from
+specific context to general context. The rule for pronunciation of cch, then, ap-
+pears before the rules for cc and ch, each of which is ordered before rules for ¢ and
+h. Procedures for the recognition of prefixes and suffixes also require an ordering:
+the prefixes com and con must be ordered before co; any suffix ending with the
+letter s must be recognized before the suffix consisting of that letter only.
+
+As an example of ordering rules for a particular string, consider the vowel a,
+and assume that it is followed by the letter r. This a may be pronounced like the a
+in warp, lariat, or carp, depending upon specification of further context. It is
+pronounced like the a in carp if it is followed by r and another consonant (other
+than r), and if it is preceded by any consonant segment except ww (note quarter,
+wharf). Consequently, a rule for a in the context of being preceded by the seg-
+ment wWw and followed by the sequence r-consonant is placed in the set of rules.
+Specification of a left context in the rule for the a in carp is subsequently unneces-
+sary. If the a is preceded by a ww, this rule will already have applied. Using this
+method, rules may be stated simply and without redundancy.
+
+6.2.4 Examples of rule application
+In this section, two words will be analyzed according to the phonological rules.
+Intermediate and final output will be provided for each word. The first stage con-
+sists of affix detection; the second stage is primarily composed of rules for the
+pronunciation of consonants in the root; the third stage contains rules for the
+pronunciation of affixes and of vowels in the root. Generalizations of these rules
+and related rules will be included in the discussion. The result of application of
+stress rules (to be discussed later) is given without comment following each
+derivation.
+
+The first example is shown in Figure 6-2. During Stage 1, no affixes are
+detected. Converting consonants in Stage 2, r is pronounced according to the most
+
+59
diff --git a/pages-txt/072.txt b/pages-txt/072.txt
new file mode 100644
index 0000000..9987592
--- /dev/null
+++ b/pages-txt/072.txt
@@ -0,0 +1,65 @@
+From text to speech: The MITalk system
+
+Soundl: Stripped C AR I B O U Stage I
+Soundl: Consonants: KK 2?2 RR ?? BB ?? 2?2 Stage 2
+Soundl: Prefixes : KK ?? RR ?? BB 2? 2?2
+
+Soundl: Vowels : KK AE RR IH BB UW Stage 3
+
+Soundl: Suffixes : KK AE RR IH BB UW
+SOUND1l: KK 'AE RR - IX - BB UW
+
+Figure 6-2: Application of letter-to-sound rules to caribou
+
+general rule in its rule sequence and that b has only one given pronunciation. The
+letter ¢, because it precedes a, is pronounced Kx.
+
+When a precedes r which, in turn, precedes either a vowel or another r within
+the same morph, it usually has the pronunciation aAt. The letter i, following its
+most general pronunciation, is assigned the segment IH. Morph-final ou is given
+the pronunciation UW.
+
+6.2.4.1 Generalizations and related rules The letter r is syllabic if preceded by a
+consonant other than r and followed by a morph-final e, e.g., acre, or the inflec-
+tional suffixes s or ed.
+
+The letter ¢ is palatalized in some cases, as in special (preceded by a vowel;
+followed by the letter i and a vowel) and ancient (preceded by the letter n; fol-
+lowed by i-vowel). It is assigned the segment ss later in its rule sequence if it is
+followed by e, i, or y. It may be noted that this is the same context which assigns
+the pronunciation IH ss to the suffix ic. If c is followed by a, o, or u, it is usually
+pronounced KK, as in this example.
+
+When a precedes r, and r is not followed by either a vowel or another r
+within the same morph, a is pronounced Aa (e.g. far, cartoon) unless preceded by
+the segment ww (e.g. warble, warp, war, wharf, quarter).
+
+In a word such as macaroon, the a preceding r-vowel is assigned pronuncia-
+tion AE in the phonological rules and is reduced to schwa in the stress rules be-
+cause it is unstressed.
+
+6.2.4.2 Second example
+
+Soundl: Stripped : S U B < V E R § > I O N
+
+Stage 1
+Soundl: Consonants: 2?2 2?2 2?2 2?2 VW 2?22 RR ZH ?? 2?2 2?2 27
+Stage 2
+Soundl: Prefixes { SS AX VV 2?2 RR ZH ?? 2?2 2?22 22
+Soundl: Vowels : SS AX
+
+Stage 3
+Soundl: Suffixes : SS AX < VV AH RR ZH > AX NN
+
+SOUND1l: SS "AX BB * - VV 'AH RR ZH * - AX NN
+
+BB <
+BB < VV AH RR ZH ?? 2?2 22?2 2?2
+BB
+
+Figure 6-3: Application of letter-to-sound rules to subversion
+
+In Figure 6-3, the affixes ion and sub are recognized in Stage 1.
+There is only one pronunciation provided for the consonant v; and r, because
+
+60
diff --git a/pages-txt/073.txt b/pages-txt/073.txt
new file mode 100644
index 0000000..681e5b2
--- /dev/null
+++ b/pages-txt/073.txt
@@ -0,0 +1,50 @@
+Letter-to-sound and lexical stress
+
+it does not fit a specified context for syllabic r, is given the standard pronunciation.
+The letter s is followed by the sequence i-vowel, making it a candidate for
+palatalization. The palatalization rule which applies, assigns the segment ZzH.
+
+In the final stage of letter-to-phonetic segment conversion, the affixes and
+vowels are considered. The prefix sub has only one possible pronunciation. The
+letter e, because it precedes the sequence r-consonant where the consonant is not
+an T, is given the pronunciation as. The palatal segment zH now forms a left con-
+text for the suffix ion, which, being word-final, is pronounced AH NN.
+
+6.2.4.3 Generalizations and related rules Because the suffix s is marked as oc-
+curring in word-final position only, the s preceding ion is not recognized as a suf-
+fix. This step also prevents the er preceding the s from consideration as a possible
+suffix.
+
+When an s preceding the sequence i-vowel! in a root or beginning a suffix is
+preceded by either a vowel or an r, it is usually pronounced zH. Some examples
+are revision, artesian, Persian and dispersion; two exceptions are controversial
+and torsion. When s is preceded by 1, and when it occurs as part of the consonant
+cluster ss, the segment preceding the vowel sequence is SH (e.g. emulsion,
+Russian). A third pronunciation is observed when s is preceded by n (e.g.
+transient, comprehension).
+
+The sequence AH RR is later changed to ER.
+
+The sequence ion following a nonpalatalized consonant is pronounced IY AH
+NN (e.g. oblivion, criterion, champion).
+
+The suffix ion may be given other pronunciations if not morph-final. For ex-
+ample, it is pronounced Iy AA NN in ganglionic and histrionic.
+
+6.3 Lexical stress placement
+The stress rules which have been implemented are a modification of a set of or-
+dered rules developed by Halle and Keyser (1971). Modifications fall into three
+
+categories:
+1. adjustments due to the condition that input is completely phonemic,
+2. reduction of the number of stress levels to 1-stress (primary), 2-stress
+(stress less than primary) and O-stress, and
+
+3. addition of special suffix-dependent stress categories.
+Additionally, one aspect of the rules has not yet been implemented. Halle’s cyclic
+rules were written to take advantage of known parts of speech. This module was
+placed after PARSER to utilize this knowledge, but does not utilize it as yet.
+Application of the rules proceeds in two phases. The first phase consists of
+the application of three ordered rules which are applied cyclically, first to the root,
+
+61
diff --git a/pages-txt/074.txt b/pages-txt/074.txt
new file mode 100644
index 0000000..0929359
--- /dev/null
+++ b/pages-txt/074.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+then to the root and leftmost suffix combined. The process continues with one
+more suffix adjoined to the string under consideration before each cycle begins,
+until the end of the word is reached. This cyclic phase is devoted solely to the
+placement of primary stress. Unless otherwise noted, prefixes are considered part
+of the root.
+
+The second, noncyclic phase includes the application to the entire word of or-
+dered rules and reduces all but one. of the primary stress marks to secondary or
+ZETO Stress.
+
+The stress marks used here are ’ for primary stress and " for secondary
+stress.
+
+In the following sections, stress placement rules will be given both as for-
+mulas and in descriptive (nonsymbolic) form. Each rule which contains more than
+one case is broken down into cases for which brief descriptions and examples are
+given. It is important to note that a particular case applies only if the rules for pre-
+vious cases have not applied, i.e., a maximum of one case per rule is applicable.
+The subrules in each case are mutually exclusive. The rules are listed in the order
+in which they apply and are marked either cyclic or noncyclic.
+
+In this context, syllable means a vowel followed by any number of consonants
+(including none). Weak syllable means a.short (or nontense) vowel followed by, at
+most, one consonant before the next vowel. The words vowel and consonant them-
+selves denote the vocalic and nonvocalic phonetic segment labels output from the
+letter-to-sound conversion stage, rather than the letters in the original word. In the
+examples, Klatt symbols are used to represent the segment labels. The short
+vowels are: AA, EH, IH, AO, UH, AH, AX, AE, and 1X. Long vowels are: EY, IY,
+AY, OW, UW, OY, and Aw. |
+
+Each formula is a phonetic segment string pattern matching expression. The
+symbols used in the formulas are defined as follows:
+
+C matches a single consonant. Sub- and superscripts denote lower and upper
+bounds, respectively, on the number of replications of the preceding term
+(usually C). For example, Cy matches any number of consecutive con-
+
+sonants (including none) while C% matches one or two consonants.
+
+V matches a single vowel.
+
+X and Y match segment strings of any length (including null, unless noted
+otherwise).
+
+Brackets [ ] denote the association of one or more features with a vowel. The fea-
+
+62
diff --git a/pages-txt/075.txt b/pages-txt/075.txt
new file mode 100644
index 0000000..ffb54b2
--- /dev/null
+++ b/pages-txt/075.txt
@@ -0,0 +1,54 @@
+Letter-to-sound and lexical stress
+
+tures used are long, short, stress, 1-stress, 2-stress, and -stress (lacking
+stress). The bracket form matches only a vowel with the associated fea-
+
+tures.
+
+Parentheses ( ) denote an optional term. When a rule with an optional term is
+tested against a word, matching with the term included is attempted first.
+Unless otherwise noted, the rule can match the word once at most; if the
+rule matches with the optional term present, then no match will be at-
+tempted with the optional term omitted.
+
+Braces { } denote a list of alternative patterns, separated by tall slashes / .
+The overall structure of a rule is:
+
+V — feature [ pattern
+
+which translates to:
+A vowel receives feature in the context of pattern
+
+where pattern contains the symbol — in the position where the vowel is to appear.
+The pattern must match the entire word, unless otherwise noted.
+A simple example of a rule follows:
+
+V — [1-stress] /x—c[ 10\'}8]
+
+which means “A vowel receives 1-stress when followed by a consonant and word-
+final long vowel.”
+
+6.3.1 Main Stress Rule (cyclic)
+
+1.V — [1-stress] /
+
+—af [Ses/vH{ [ e/ V)
+
+where X must contain all prefixes (i.e. prefixes are never stressed by
+this rule).
+
+a. Assign 1-stress to the vowel in a syllable preceding a weak
+
+syllable followed by a morph-final syllable containing a short
+
+vowel and zero or more consonants (e.g. difficult—pp ’1H
+
+FF FF IH KK AH LL TT).
+b. Assign 1-stress to the vowel in a syllable preceding a weak
+
+syllable followed by a morph-final vowel (e.g. oregano— a0
+
+RR ‘EH GG AE NN OW).
+c. Assign 1-stress to the vowel in a syllable preceding a vowel
+
+63
diff --git a/pages-txt/076.txt b/pages-txt/076.txt
new file mode 100644
index 0000000..1aabaab
--- /dev/null
+++ b/pages-txt/076.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+followed by a morph-final syllable containing a short vowel
+and zero or more consonants (e.g. secretariat—ss EH KK
+RR EH TT 'AE RR IY AE TT).
+
+d. Assign 1-stress to the vowel in a syllable preceding a vowel
+followed by a morph-final vowel (e.g. oratorio—20 RR AE
+TT ‘A0 RR IY OW).
+
+2.V =5 [1-stress] / X— co{ | short|co/ v }
+
+where X must contain all prefixes.
+
+a. Assign 1-stress to the vowel in a syllable preceding a short
+vowel and zero or more consonants (e.g. edit—‘EH DD IH
+7T, bitumen—BB AY TT ‘UW MM EH NN).
+
+b. Assign 1-stress to the vowel in a syllable preceding a morph-
+final vowel (e.g. agenda—AE JJ 'EH NN DD AE).
+
+3.V — [l-stress] / X—C,
+
+where X must contain all prefixes.
+a. Assign 1-stress to the vowel in the last syllable (e.g. stand —
+SS TT ’AE NN DD, §0—GG ’OW, parole—PP AE RR ’'OW
+LL, hurricane—HH AH RR IH KK ’‘AY NN -- reduced to 2-
+stress by a later rule).
+
+6.3.2 Exceptions to the Main Stress Rule
+A condition has been placed on the Main Stress Rule relating to assignment of
+stress, dependent upon four categories of special suffixes. One category is marked
+to force stress to be placed on either the final or the penultimate syllable of the
+string under consideration. (It should be noted that later rules may change this
+assignment.) This stress placement replaces the Main Stress Rule on the cycle in
+which the special suffix is the rightmost morph. Suffixes in this category include
+IH FF ‘AY (-ify), a0 RR IY(-0ry),and IH FF IH KK (-ific).
+
+The second category of suffixes does not affect stress; the cycle in which such
+a suffix is rightmost in the domain is skipped. Later cycles, however, do include
+the suffix as part of their domain of application. Examples are: pp aa MM (-dom),
+MM EH NN TT (-ment), and LL EH SS (-less).
+
+The third category is a combination of the first two: stress is placed on one of
+the vowels in the suffix and all three cyclic rules are skipped for the current
+domain. Examples are: ‘IH RR (-eer), SS ‘EH LL FF (-self), and sg ’IH PP
+
+(-ship).
+64
diff --git a/pages-txt/077.txt b/pages-txt/077.txt
new file mode 100644
index 0000000..910dd37
--- /dev/null
+++ b/pages-txt/077.txt
@@ -0,0 +1,47 @@
+Letter-to-sound and lexical stress
+
+The last category replaces the Main Stress Rule with the following when the
+suffix is IH KX (-ic): assign 1-stress to the vowel in the first syllable in the word.
+
+6.3.3 Stressed Syllable Rule (cyclic)
+1.V — [1-stress] /
+
+hort | ~1 1-str
+X—CO{ K ]CO/V}VCO[ siess |y
+
+where Y contains no 1-stress and X must contain all prefixes.
+
+a. Assign 1-stress to the vowel in a syllable preceding a weak
+syllable followed by a syllable which is followed by the
+rightmost primary-stressed vowel (e.g. oxygenate—’AA KK
+SS IH JJ EH NN 'EY TT (stressed on first syllable) -- the
+stress on the final syllable is later reduced).
+
+b. Assign 1-stress to the vowel in a syllable preceding a vowel
+which is followed by a syllable followed, in turn, by the
+rightmost primary-stressed vowel (e.g. stereobate —»ss TT
+'EH RR IY OW BB ‘EY TT (stressed on first syllable) -- the
+stress on the final syllable is later reduced).
+
+2.V = [1-stress] / X— CoVCo| 1555 |y
+
+where Y contains no 1-stress and X must contain all prefixes.
+a. Assign 1-stress to the vowel two syllables to the left of the
+
+rightmost primary-stressed vowel (e.g. propaganda— PP RR
+'AA PP AE GG 'AE NN DD AE (stressed on first syllable) --
+the stress on this leftmost vowel is later properly reduced).
+
+3.V [Lstress] /¥— Co L-sgress |y
+
+where Y contains no 1-stress and X must contain all prefixes.
+a. Assign 1-stress to the vowel one syllable to the left of the
+rightmost primary-stressed vowel (e.g. hormone > HH ‘A0
+RR MM ’'OW NN -- the stress on the final vowel is later
+
+reduced).
+6.3.4 Alternating Stress Rule (cyclic)
+
+1.V — [1-stress] / X— CoVVCo| L-stress ] ¢,
+
+65
diff --git a/pages-txt/078.txt b/pages-txt/078.txt
new file mode 100644
index 0000000..2fb4fb5
--- /dev/null
+++ b/pages-txt/078.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+a. Assign 1-stress to the vowel three syllables to the left of a
+primary-stressed vowel occurring in the last syllable if the
+following syllable contains only a vowel (e.g. heliotrope —
+HH 'IY LL IY OW TT RR ‘OW PP -- the stress in the last
+syllable is later reduced).
+
+2.V > [1-stress] / X— COVCO[ l's{f,ess] Co
+
+a. Assign 1-stress to the vowel two syllables to the left of a
+primary-stressed vowel occurring in the last syllable (e.g.
+gelatinate—>JJ ’YEH LL ’'AE TT IH NN ‘EY TT -- the
+stress in the first syllable is later deleted; stress in the last
+syllable is later reduced).
+
+6.3.5 Destressing Rule (noncyclic)
+
+This rule is the first destressing phase wherein the selected stressed vowels are
+reduced in stress and tenseness. The action (— -stress) in the rules below in-
+dicates that the stress marking for the selected vowel is removed. In addition, if
+the destressed vowel is long, it is shortened as follows: EY — AE, IY — EH,
+AY — IH, OW — aA,or UW — UH (oY and aw are not modified).
+
+{r
+1.V = [stress] | Cover—c]| ® {}SS] Y
+
+where the rule may apply more than once per word.
+a. Shorten and destress any vowel not in the first syllable which
+is followed by a single consonant and a stressed vowel (e.g.
+instrumental > ’IH NN SS TT RR (’)UW MM 'EH NN TT
+AE LL -- the segment UW is reduced to UH, and later to Ax).
+
+2.V > [steess] / o 1T || s x
+
+where the rule may apply in addition to the previous rule.
+a. Destress a nonlong vowel in the first syllable which is fol-
+
+lowed by a single consonant and a stressed vowel (e.g.
+gelatinate—JJ (Y)EH LL YAE TT ‘IH NN 'EY TT).
+
+6.3.6 Compound Stress Rule (noncyclic)
+This rule, as developed by Halle, applies to both compounds and noncompounds.
+The assumption with letter-to-phonetic segment rules is that words are composed
+
+66
diff --git a/pages-txt/079.txt b/pages-txt/079.txt
new file mode 100644
index 0000000..41f45ce
--- /dev/null
+++ b/pages-txt/079.txt
@@ -0,0 +1,43 @@
+Letter-to-sound and lexical stress
+
+of affixes and only one root. “Therefore, as this rule applies to words converted by
+letter-to-segment rules in the module, it applies to noncompounds only, and its ef-
+fect is to locate the primary stress which is to be retained. The action (V —
+retain) indicates that 1-stress is to be reduced to 2-stress on all but the matched
+
+vowel.
+1.V - retin/X | 157 |y v 1v
+
+where Y does not contain 1-stress.
+a. Retain 1-stress on a vowel if it is followed by at least one
+syllable and a word-final unstressed 1y. Reduce all other 1-
+stress to 2-stress (e.g. legendary— 1L ‘EH JJ EH NN DD
+(‘= ")AE RR IY).
+
+2.V = retain /X [ 1.5zess |¥ VCo
+
+where Y does not contain 1-stress.
+a. Retain 1-stress on a vowel if it is followed by a string of one
+or more syllables without primary stress. Reduce all other 1-
+stress to 2-stress (e.g. hurricane—HH ‘AH RR IH KK
+(= ")EY NN, gastritis—>GG (‘- ")AE SS TT RR 'AY
+TT IH SS, trinitarian—>TT RR (‘— ")IH NN IH TT 'AE
+RR IY AX NN).
+
+3.V > retain /X | 1 gress |¥
+
+where Y does not contain 1-stress.
+a. Retain 1-stress on the only vowel to which it has been as-
+signed (e.g. stand—>Ss TT ‘AE NN DD, edit—’EH DD IH
+7T, difficult -DD ’IH FF IH KK AH LL TT).
+
+This rule also includes a condition dependent upon two categories of special
+suffixes. Those suffixes discussed with the Main Stress Rule which do not affect
+stress placement are excepted from the domain of the Compound Stress Rule if
+they are either word-final or precede another word-final suffix in the same cate-
+
+gory. The other category of suffixes is marked for special stress retention (i.e. is
+allowed to be part of the Y pattern even though stressed). These suffixes are: IH
+2z AX MM (-ism), 18 vv (-ive), and AE TT (vowel-reduced -ate).
+
+67
diff --git a/pages-txt/080.txt b/pages-txt/080.txt
new file mode 100644
index 0000000..7c773d4
--- /dev/null
+++ b/pages-txt/080.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+6.3.7 Strong First Syllable Rule (noncyclic)
+1.V = 2-stress / CO[ -Stress, long]X
+
+a. Assign 2-stress to the vowel in the first syllable if it is long
+(e.g. hydrosanitation—HHE "AY DD RR OW SS ’AE NN IH
+TT ’‘EY SH AX NN, dielectric—DD "AY ’YEH LL ’‘EH KK
+TT RR IH KK).
+
+2.V — 2-stress / CO[ -stress] CX
+
+a. Assign 2-stress to the vowel in the first syllable if it is fol-
+lowed by at least two consonants (e.g. circumnavigation —
+SS "AH RR KK AH MM NN ’AE VV IH GG 'EY SH AX NN).
+
+6.3.8 Cursory Rule
+
+1.V — 2-stress /¥ [ l's{f,ess] Cy— CVX
+
+where Y contains no 1-stress.
+a. The vowel following the primary-stressed vowel, if it is not
+the last vowel in the word, is shortened and its stress removed
+(e.g. infirmary—"IH NN FF ‘AH RR MM (")AE RR IY,
+Cursory —KxK ‘AH RR SS (")AO RR IY, curative—>KK YY
+
+‘UH RR (")AE TT IH VV).
+This has provision for one class of exceptional suffixes. If the pattern CVX in the
+rule above matches a string of suffixes from the “ignored” category of the Main
+Stress Rule, then the Cursory Rule is suppressed for this case.
+
+6.3.9 Vowel Reduction Rule
+This rule reduces unstressed short vowels to the appropriate schwa. The action (V
+— reduce) indicates that Ex and IH are changed to 1x while all other short
+
+vowels are changed to ax.
+1.V — reduce / X [ -stresE,_short] Y
+
+where the rule may match more than once per word.
+a. Reduce EH and 1Hto IX if not stressed (e.g. ptolemaic— TT
+
+‘A0 LL IX MM ’'EY IX KK).
+b. Reduce other short unstressed vowels to ax (e.g. curator—
+KK ‘UH RR AX TT AX RR).
+
+68
diff --git a/pages-txt/081.txt b/pages-txt/081.txt
new file mode 100644
index 0000000..6645368
--- /dev/null
+++ b/pages-txt/081.txt
@@ -0,0 +1,77 @@
+Letter-to-sound and lexical stress
+
+It should be noted that stress may be further reduced in PHONO1 (see Chap-
+ter 8) according to parts of speech and phrasing.
+
+6.4 An example
+
+Figure 6-4 is an example of the application of both the letter-to-sound and the
+stress rules. The results after each of the stages of the letter-to-sound rules, and
+after each of the stress rules, are given. There are two complete cycles of the
+cyclic stress rules, followed by the noncyclic rules. The rules are followed by ap-
+plication of syllabification rules.
+
+=
+(an1
+=
+=
+H
+A
+g
+hY
+@
+-
+
+Soundl: Stripped
+T E > E D
+
+Soundl: Consonants: ?°?
+2?2 22 2?2 2?2 2?2 ??
+
+Soundl: Prefixes : MM AX LL TT IH < PP 2?2 JJ 22 NN 2?2 2?2
+2?2 2?22 22 2?22 22
+
+Soundl: Vowels + MM AX LL TT IH < PP AE JJ IH NN 2?2 27
+2?2 2?2 22 2?2? 27
+
+Soundl: Suffixes MM AX LL TT IH < PP AE JJ IH NN > EY
+TT > IH DD
+
+Soundl: Apply MSR : MM AX LL TT IH < PP ‘AE JJ IH NN
+
+Soundl: Apply SSR : MM AX LL TT IH < PP YAE JJ IH NN
+
+Soundl: Apply ASR : MM AX LL TT IH < PP 'AE JJ IH NN
+
+Soundl: Apply MSR : MM AX LL TT IH < PP 'AE JJ IH NN >
+'EY TT
+
+Soundl: Apply SSR : MM AX LL TT IH < PP 'AE JJ IH NN >
+reY TT
+
+Soundl: Apply ASR : MM AX LL TT IH < PP AE JJ IH NN >
+'EY TT
+
+Soundl: Skipping : MM AX LL TT IH < PP 'AE JJ IH NN >
+fEY TT > IH DD
+
+Soundl: Destress : MM AX LL TT IH < PP AE JJ IH NN >
+EY TT > IH DD
+
+Soundl: Compound : MM AX LL TT IH < PP AE JJ IH NN >
+
+"EY TT > IH DD
+Soundl: Strong 1lst: MM "AX LL TT IH < PP AE JJ IH NN >
+"EY TT > IH DD
+
+Soundl: Cursory : MM "AX LL TT IH < PP 'AE JJ IH NN >
+"EY TT > IH DD
+Soundl: Reduce : MM "AX LL TT IX < PP 'AE JJ IX NN >
+
+"EY TT > IX DD
+SOUND1: MM "AX LL - TT IX * - PP 'AE - JJ IX NN * - "EY
+TT * - IX DD
+
+Figure 6-4: Example of letter-to-sound and stress rule operation
+
+69
diff --git a/pages-txt/082.txt b/pages-txt/082.txt
new file mode 100644
index 0000000..83c60e7
--- /dev/null
+++ b/pages-txt/082.txt
@@ -0,0 +1,3 @@
+I
+
+Synthesis
diff --git a/pages-txt/083.txt b/pages-txt/083.txt
new file mode 100644
index 0000000..e83a3fc
--- /dev/null
+++ b/pages-txt/083.txt
@@ -0,0 +1,50 @@
+7
+
+Survey of speech synthesis technology
+
+7.1 Overview
+This brief review of speech synthesis technology is concerned primarily with prac-
+
+tical methods of generating spoken messages by computers or special-purpose
+devices. Basic research ditected at modeling articulatory-to-acoustic transfor-
+mations (Flanagan et al., 1975; Flanagan and Ishizaka, 1976) will not be reviewed.
+
+7.1.1 Applications
+Applications for synthetic speech output fall into four broad categories:
+
+1. Single word responses (e.g. Speak-"N-Spell)
+
+2. A limited set of messages within a rigid syntactic framework (e.g.
+
+telephone number information)
+
+3. Large, fixed vocabulary with general English syntax (e.g. teaching
+
+machine lessons)
+
+4. Unrestricted text to speech (e.g. a reading machine for the blind)
+
+The degree of generality and difficulty increases considerably from 1 to 4.
+Prerecorded messages work well for single-word response applications, whereas
+an increasing knowledge of the acoustic-phonetic characteristics of speech,
+phonology, and syntax is required for satisfactory synthesis of general English.
+
+7.1.2 Three methods of employing MITalk modules
+
+The entire MITalk text-to-speech system can be used in applications falling in cat-
+egory 4 above, or pieces of the MITalk synthesis routines might be used in other
+applications. For example, if an abstract phonemic and syntactic representation for
+an utterance can be stored in the computer or derived by linguistic rules, only
+modules beginning with PHONO?2 in Figure 7-1 are needed. Speech represented
+in this way requires storage of only about 100 bits per second.
+
+Another way to use the synthesis routines to produce even more natural
+sounding speech (at a cost in bits and human intervention) is to begin by specify-
+ing the input to the phonetic component PHONET in Figure 7-1. If durations and
+fundamental frequency values are taken from a natural recording rather than being
+
+computed by rule, a remarkably human voice quality is achieved. Storage of about
+250 bits per second of speech is required, and of course, considerable effort is re-
+
+quired to prepare the input representation.
+71
diff --git a/pages-txt/084.txt b/pages-txt/084.txt
new file mode 100644
index 0000000..9660cbe
--- /dev/null
+++ b/pages-txt/084.txt
@@ -0,0 +1,44 @@
+From text to speech: The MITalk system
+
+TEXT
+
+ANALYSIS
+OMPONENTS
+
+PHONO1
+PHONOLOGICAL
+COMPONENT
+PROSODIC
+COMPONENT
+FOTARG PHONEMIC
+SYNTHESIS
+PHONETIC BY RULE
+COMPONENT |: PHONET
+STORED
+PROSODICS
+FORMANT SYNTHESIS
+SYNTHESIZER
+
+COEWAV
+
+SPEECH
+WAVEFORM
+
+Figure 7-1: Synthesis blocks of the MITalk system
+
+7.2 Background
+
+Automatic voice response machines, based on the principle of concatenating
+prerecorded speech waveforms, have been used to provide such information as
+time of day and weather reports by telephone since the early 1930s. More
+recently, voice response systems have been used to provide rapid telephone access
+to information stored -in computers in such diverse areas as inventory control,
+credit inquiries, bank balance information, and shipping status inquiries. In most
+cases, the request can be keyed in by touch-tone telephone.
+
+The earliest voice response units were analog systems in which the
+vocabulary elements (words and short phrases) were stored as analog recordings of
+speech waveforms. Many currently available audio response units still operate on
+this principle (Homsby, 1972). Systems of this type have served very well in a
+
+72
diff --git a/pages-txt/085.txt b/pages-txt/085.txt
new file mode 100644
index 0000000..b21ca25
--- /dev/null
+++ b/pages-txt/085.txt
@@ -0,0 +1,48 @@
+Survey of speech synthesis technology
+
+variety of applications where the vocabulary consists of a small number of words
+and where the messages are simple and follow a rather rigid format. However,
+there are a number of limitations of such systems which make them unsatisfactory
+for more general applications, such as automatic conversion of English text to
+speech.
+
+Figure WORD-BLEND illustrates some of the differences between words
+spoken in isolation and the same words put together in a fluently spoken sentence.
+Not only are most words considerably shorter, but there are acoustic changes at the
+boundaries between words due to coarticulation, and due to phonological rules that
+change the pronunciation of words in certain sentence contexts. Furthermore, the
+intonation, rhythm, and stress pattern appropriate to the sentence cannot be syn-
+thesized if one simply concatenates prerecorded words. These prosodic qualities
+turn out to be extremely important. Words that are perfectly intelligible in isola-
+tion seem to come too fast and in a disconnected manner when the words are con-
+catenated in such a way that the prosody is wrong.
+
+Thus simple word concatenation schemes have severe limitations as audio
+response units. In contrast, there are several newer techniques under development
+that do not have these limitations. These techniques range from complex systems
+for speech synthesis-by-rule (where a synthetic waveform is computed from a
+knowledge of linguistic and acoustic rules), to relatively simple systems for creat-
+ing speech utterances by concatenating prerecorded speech waveform chunks
+smaller than a word (using vocoder analysis-synthesis technology to gain
+flexibility in reassembly).
+
+Speech synthesis techniques have been reviewed in Flanagan and Rabiner
+(1973), Klatt (1974), and Rabiner and Schafer (1976). We describe here some of
+the current techniques that have been employed. Of particular interest are criteria
+by which one selects an inventory of basic speech units to be used in utterance as-
+sembly, how one selects a method of unit concatenation, and how to specify
+sentence-level prosodic variables.
+
+7.3 Synthesis techniques
+
+The techniques to be covered in this section include systems for forming messages
+out of words as the basic units, out of syllables and diphones as the basic units, and
+out of phonemes as the basic units.
+
+7.3.1 Word assembly
+
+7.3.1.1 Prerecorded words and phrases Early methods of spoken message as-
+sembly used prerecorded words (or whole phrases) that were concatenated into
+sentences (Homsby, 1972; Chapman, 1971; Buron, 1968). Brief pauses were in-
+
+73
diff --git a/pages-txt/086.txt b/pages-txt/086.txt
new file mode 100644
index 0000000..f70a51c
--- /dev/null
+++ b/pages-txt/086.txt
@@ -0,0 +1,32 @@
+From text to speech: The MITalk system
+
+25
+
+2 1
+~N A I"'
+% 6 i iy
+) 4 R I
+c f A
+N ALt
+@ 2 - 1I 1 Ix i
+g 21 CHl e |
+
+0
+
+a) ‘after’
+
+: T
+e $ | T
+7 T T
+s
+
+1 ]
+iy I
+|| }1 L
+
+g) ‘Put a sphere after one block.’ h) ‘Blue after red is the order.’
+
+Figure 7-2: An example of the differences between words spoken in isolation
+and words spoken as a continuous utterance
+
+74
diff --git a/pages-txt/087.txt b/pages-txt/087.txt
new file mode 100644
index 0000000..171dc0e
--- /dev/null
+++ b/pages-txt/087.txt
@@ -0,0 +1,46 @@
+Survey of speech synthesis technology
+
+serted between words, and a reasonable sentence intonation contour was realized
+by restricting a given prerecorded element to only certain utterance positions. A
+great deal of care was taken in speaking, recording, and editing the basic
+vocabulary items.
+
+Word storage has involved various analog and digital techniques that range
+from recording each word into a half-second slot on a rotating drum, to sophis-
+ticated digital techniques for reducing the number of bits that must be stored.
+Digital methods for representing speech waveforms are reviewed by Rabiner and
+Schafer (1976) and by Jayant (1974). One remarkable technique developed at
+Texas Instruments (Wiggins, 1979) involves storing a 1000 bit-per-second
+linear-prediction representation for each word on integrated circuit chips having a
+capacity of 200 seconds of speech, and using an IC linear-prediction synthesizer to
+play selected words (all of this circuitry being offered at $50 in the Speak-’N-Spell
+children’s toy).
+
+7.3.1.2 Formant vocoding of words Rabiner et al. (1971a) suggested that one
+could get rid of the choppiness of waveform concatenation by extracting formant
+trajectories for each prerecorded word and smoothing formant parameter tracks
+across word boundaries before formant vocoder resynthesis. A second advantage
+of formant analysis-synthesis of the words that make up a synthetic utterance is
+that the duration pattern and fundamental frequency contour can be adjusted to
+match the accent pattern, thythm, and intonation requirements of the sentence to be
+produced. The technique has been used successfully in telephone number syn-
+thesis where a known prosodic contour could be superimposed (for example, a
+pause and a “continuation rise” intonation can be placed just before the fourth digit
+of a seven digit telephone number). However, the authors did not offer general
+prosodic rules for sentence synthesis.
+
+7.3.1.3 Linear-prediction coded words Olive (1974) later showed that a similar
+system could be based on linear prediction encoding. Furthermore, it was deter-
+mined that a correct fundamental frequency contour for a sentence was percep-
+tually more important than the exact duplication of the durational pattern or careful
+smoothing of the formant transitions between words.
+
+The advantage of the prerecorded word as a unit is ease of bringing up a
+limited audio response unit. The disadvantages are that: 1) large vocabularies are
+impractical, and 2) general timing and fundamental frequency rules that adjust the
+prosodic characteristics of a word as a function of sentence structure are more
+easily defined at a segmental level. For example, only the final vowel and
+postvocalic consonants of a word are lengthened at phrase and clause boundaries
+(Klatt, 1976b).
+
+75
diff --git a/pages-txt/088.txt b/pages-txt/088.txt
new file mode 100644
index 0000000..a49236f
--- /dev/null
+++ b/pages-txt/088.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+7.3.2 Syllables and diphones
+
+Instead of using words as the basic building blocks for sentence production, a
+smaller inventory of basic units is required if arbitrary English sentences are to be
+synthesized. The inventory of basic speech units must satisfy several require-
+ments, including: 1) the ability to construct any English word by concatenating the
+units one after another, and 2) the ability to change duration, intensity and fun-
+damental frequency according to the demands of the sentence syntax and stress
+pattern in such a way as to produce speech that is both intelligible and natural.
+
+7.3.2.1 Syllables The intuitive notion of the syllable as the basic unit has con-
+siderable theoretical appeal. Any English word can be broken into syllables con-
+sisting of a vowel nucleus and adjacent consonants. Linguists have been unable to
+agree on objective criteria for assigning consonants to a particular vowel nucleus
+in certain ambiguous cases such as “butter”, but an arbitrary decision can be made
+for synthesis purposes.
+
+The greatest theoretical advantage of the syllable concemns the way that
+acoustic characteristics of most consonant-vowel transitions are preserved.
+Context-conditioned acoustic changes to consonants are automatically present to a
+great extent when the syllable is chosen as the basic unit, but not when smaller
+units such as the phoneme are concatenated.
+
+The disadvantages of the syllable are: 1) coarticulation across syllable boun-
+daries is not treated, and this coarticulation can be just as important as within-
+syllable coarticulation, 2) if prerecorded syllables are stored in the form of
+waveforms, there is no way to mimic the prosodic contour of the intended mes-
+sage, and 3) the syllable inventory for general English is very large. There are cur-
+rently no syllable-based systems for speech generation.
+
+7.3.2.2 Demisyllables  The last two disadvantages of a syllable-based scheme
+might be overcome by replacing syllables by demisyllables. The demisyllable is
+defined as half of a syllable, either the set of initial consonants plus half of the
+vowel, or the second half of the vowel plus any postvocalic consonants (Fujimura
+and Lovins, 1978; Lovins and Fujimura, 1976). For example, the word “construct”
+would be divided into co-, <on, stru-, and -uct. It is claimed that there are less
+than 1000 demisyllables needed to synthesize any English utterance. Each
+demisyllable can be represented in terms of a set of linear prediction frames. Con-
+catenation rules include some smoothing across demisyllable boundaries. The
+problems with demisyllable-based approaches are: 1) how to smooth across
+demisyllable boundaries to simulate natural coarticulation, and 2) how to adjust
+durations to match the desired pattern for a sentence. The latter problem is serious
+
+76
diff --git a/pages-txt/089.txt b/pages-txt/089.txt
new file mode 100644
index 0000000..cba784e
--- /dev/null
+++ b/pages-txt/089.txt
@@ -0,0 +1,45 @@
+Survey of speech synthesis technology
+
+because lengthening and shortening of speech tends to take place during the
+steady-state portions of sustainable phonetic segments, whereas the demisyllable is
+a mixture of portions of steady states and transitions.
+
+7.3.2.3 Diphones The diphone is defined as half of one phone followed by half
+of the next phone. Peterson et al. (1958), and Wang and Peterson (1958) were the
+first to propose speech synthesis by diphone concatenation. They argued that the
+diphone is a natural unit for synthesis because the coarticulatory influence of one
+phoneme does not usually extend much further than halfway into the next
+phoneme. Since diphone junctures are usually at articulatory steady states, min-
+imal smoothing should be required between adjacent diphones. They speculated
+that several thousand diphones would be required if real speech waveform seg-
+ments were used, because each of the diphones would have to be recorded at
+several different durations and with several different pitch contours.
+
+Dixon and Maxey (1968) later showed that highly intelligible synthetic
+speech could be fashioned from diphones defined in terms of sets of control
+parameter time functions to control a formant synthesizer. Only about 1500
+diphone elements were required (40 phonemes followed by almost any of 40
+phonemes) because duration, intensity, and fundamental frequency could be ad-
+justed independently (by hand, in their case). to take into account effects of stress,
+intonation, and rhythm. Unfortunately, their diphone definitions were never
+published.
+
+Olive (1977) proposed that diphones could be defined in terms of two linear-
+prediction pseudo-area-function targets and a linear transition between them. If
+durations and fundamental frequency were specified by hand, Olive and Spick-
+enagle (1976) showed that quite natural intelligible speech could be produced by
+this method of linear-prediction diphone synthesis. The specification of durational
+rules is a problem, just as in the case of demisyllables, because the most natural
+framework for stating rules is in terms of phonetic segments.
+
+7.3.2.4 Prosodic rules Relatively little work on general prosodic rules has been
+published within the context of syllable, demisyllable, and diphone concatenation
+systems. Olive (1974) proposed an unusual set of word-based fundamental fre-
+quency rules that depend on syntactic structure, but are not influenced by the stress
+pattern within a word. Recent work at Bell Laboratories by Liberman and Pierre-
+humbert on rules for the specification of durations and fundamental frequency con-
+tours in a diphone-based system shows considerable promise (Pierrehumbert,
+
+1979).
+
+77
diff --git a/pages-txt/090.txt b/pages-txt/090.txt
new file mode 100644
index 0000000..1d34aa1
--- /dev/null
+++ b/pages-txt/090.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+7.3.3 Phonemic synthesis-by-rule
+
+7.3.3.1 Phoneme synthesis from natural speech Phonemes have been considered
+as basic speech units because there are only about 40 of them in English, and there
+are good linguistic reasons for representing speech by phonemes. Unfortunately,
+there is no possibility of extracting phonemic-sized chunks from natural speech in
+such a way that they can be reassembled into new utterances because of the large
+acoustic changes to a phoneme that occur in different phonetic environments.
+Phonemes are a good starting point for terminal-analog speech synthesis-by-rule
+programs (discussed below) because the rule programs can utilize a complex set of
+rules to predict acoustic changes in different phonetic environments, but some
+other unit is needed for a concatenation scheme.
+
+7.3.3.2 Formant-based synthesis strategies Synthesis-by-rule schemes employ-
+ing a formant-resonator speech synthesizer range from the excellent early work of
+Holmes et al. (1964) to the synthesis of intelligible speech from an input represen-
+tation consisting of an abstract linguistic description (Mattingly, 1968b; Coker,
+1967; Coker et al., 1973; Klatt, 1976a). The formant synthesizer accepts input
+time functions that determine formant frequencies, voicing, friction, and aspiration
+source amplitudes, fundamental frequency, and individual formant amplitudes for
+fricatives. The synthesizer produces an output waveform that is intended to ap-
+proximate the perceptually most relevant acoustic characteristics of speech.
+
+Formant synthesizers come in many different configurations (Dudley et al.,
+1939; Cooper et al., 1951; Lawrence, 1953; Stevens et al., 1953; Rosen, 1958;
+Tomlinson, 1966; Liljencrants, 1968; Gold and Rabiner, 1968; Klatt, 1972;
+Flanagan et al., 1975). Holmes (1961, 1973) has shown that terminal-analog
+methods of speech synthesis are capable of generating synthetic speech that is in-
+distinguishable from the original recording of a talker if the parameter values are
+properly chosen.
+
+7.3.3.3 Control strategy Control parameter values such as formant frequency
+motions are determined from a phonetic transcription of the intended sentence
+using a set of heuristic rules. In one case (Coker, 1967), the rules manipulate a
+simplified articulator)"' model of the vocal tract. Other rule programs manipulate
+formant values directly, using heuristics such as the locus theory (Holmes et al.,
+1964), or a modified locus theory (Klatt, 1979b).
+
+In a fully automatic system, the phonetic transcription, durations, stress, and
+fundamental frequency targets are derived from an abstract syntactic and phonemic
+representation for a sentence by a set of rules that approximate a phonological
+description of English (Klatt, 1976a, 1979a; O’Shaughnessy, 1977; Umeda, 1977).
+
+78
diff --git a/pages-txt/091.txt b/pages-txt/091.txt
new file mode 100644
index 0000000..a855bc8
--- /dev/null
+++ b/pages-txt/091.txt
@@ -0,0 +1,48 @@
+Survey of speech synthesis technology
+
+The input to the rules includes phonemes, stress, word and morpheme boundaries,
+and syntactic structure.
+
+In time, these methods ought to be able to produce highly intelligible natural
+speech, but present results are frequently perceived to be somewhat unnatural and
+machine-like. This appears to be due mainly to the intricate complexity of the
+speech code and the fact that not all of the rules are known at this time. There is a
+particular need to improve on the specification of fundamental frequency and dura-
+tion algorithms, perhaps by making incremental improvements to current al-
+gorithms (Umeda, 1976; Klatt, 1979a; Maeda, 1974; O’Shaughnessy, 1977; Pierre-
+humbert, 1979).
+
+7.4 Applications
+
+7.4.1 Synthesis of arbitrary English sentences
+
+From the above discussion, it should be clear that there are a number of promising
+methods for synthesizing general English. To generate a particular utterance, one
+must know 1) the phonemic (or phonetic) representation for each word, 2) the
+stress pattern for each word, 3) aspects of the syntactic structure of the sentence,
+and 4) the locations of any words that are to receive semantic focus. This infor-
+mation would have to be stored in the computer for each utterance to be syn-
+thesized, or it might be generated from a deep-structure representation of the con-
+cept to be expressed (Woods et al., 1976; Young and Fallside, 1979).
+
+7.4.2 Synthesis of arbitrary English names
+
+Research at Bell Laboratories (Denes, 1979; Liberman, 1979; Olive, 1979) is
+directed at the ability to synthesize any name from a telephone directory for ap-
+plication in automated directory assistance. The linguistic problems associated
+with converting spelling to a phonetic representation and stress pattern are severe
+since it is sometimes necessary to guess the native language of the individual be-
+fore a good rendering of the pronunciation is possible (Liberman, 1979). Once a
+phonetic representation has been derived, this experimental system uses diphone
+synthesis (Olive, 1979) to generate a waveform. |
+
+7.4.3 Text-to-speech conversion
+
+The transformation of English text to speech is a much more formidable problem
+than the synthesis of an arbitrary sentence from a knowledge of its underlying lin-
+guistic representation. The text does not indicate everything that one would like to
+know (unless one builds a machine that can recognize the meaning of the text, and
+thereby disambiguate (frequently occurring) syntactic ambiguities, and determine
+semantic focus relations).
+
+79
diff --git a/pages-txt/092.txt b/pages-txt/092.txt
new file mode 100644
index 0000000..ec878d6
--- /dev/null
+++ b/pages-txt/092.txt
@@ -0,0 +1,31 @@
+From text to speech: The MITalk system
+
+We have argued in Chapters 2-6 that in order to transform English text to
+speech, one must first try to derive an underlying abstract linguistic representation
+for the text. There are at least two reasons why a direct approach is suboptimal: 1)
+rules for pronouncing words must take into consideration morphemic structure
+(e.g. consider the pronunciation of the th of outhouse) and syntactic structure (e.g.
+there exist many noun-verb ambiguities in English such as perm’it - p’ermit), and
+2) sentence duration pattern, and fundamental frequency contour depend, to a
+major extent, on the syntactic structure of the sentence.
+
+There are currently several text-to-speech systems under development in the
+United States (Nye et al., 1973; Kurzweil, 1976; Caldwell, 1979; Morris, 1979)
+and elsewhere (Carlson and Granstrom, 1976). The simplest approach is to devise
+a set of heuristic letter-to-sound rules and then create an exceptions dictionary for
+frequently occurring words that are processed incorrectly by the letter-to-sound
+rules (Kurzweil, 1976). The exceptions dictionary is then augmented by function
+words that are useful for parsing strategies. The phonetic representation for a sen-
+tence that is derived in this way serves as the input to a synthesis-by-rule device
+such as Votrax (Gagnon, 1978) or a software synthesis-by-rule program.
+
+The MITalk system represents a more ambitious approach of generalized
+morphemic analysis, so as to do a better job of figuring out the pronunciation of
+words and to better assign parts of speech to each word, and thereby compute
+phrase and clause boundaries with greater accuracy. The real question is whether
+current algorithms are good enough to make automatic text-to-speech output ac-
+ceptable to the user. There is clear indication that motivated users (such as the
+blind) benefit from these devices after a period of acclimation, but considerable
+concentration is required.
+
+80
diff --git a/pages-txt/093.txt b/pages-txt/093.txt
new file mode 100644
index 0000000..77b681d
--- /dev/null
+++ b/pages-txt/093.txt
@@ -0,0 +1,42 @@
+8
+
+The phonological component
+
+8.1 Overview
+
+The phonological component PHONOL accepts input from the text analysis
+routines (described in Chapters 2-6) and produces an output that is sent to the
+prosodic component PROSOD (to be described in Chapter 9). PHONOL is
+divided into two modules PHONO1 and PHONO2. PHONOI1 uses information
+from the PARSER to specify the syntactic markers that influence the spoken out-
+put. PHONO2 contains a set of segmental recoding rules that are activated to
+select an appropriate allophone for each phoneme, and to simplify certain un-
+stressed phonetic sequences. Rules for pausing are included in both PHONOI1 and
+PHONOZ2. Pauses of various durations are inserted at sentence boundaries, clause
+boundaries, and locations in the text of certain punctuation marks such as commas.
+Some additional pauses are introduced in longer phrases and slow speaking rate so
+that the talker does not seem to have an inhuman supply of breath.
+
+8.1.1 Synthesis-by-rule
+
+If a subset of the MITalk system is to be used as a speech-synthesis-by-rule
+program by deleting the analysis modules, the preferred first module would be
+PHONO2 or PROSOD. The input to the system would then be an abstract
+representation containing phonemes, lexical stress symbols, and syntactic structure
+symbols for each sentence to be synthesized. Applications for this mode of speech
+generation by computer include cases where an abstract syntactic and phonemic
+representation for each sentence is known or can be computed. Speech quality will
+be better than in the text-to-speech case because analysis errors can be avoided, but
+considerable linguistic sophistication is required of users. Storage requirements
+for sentences are minimal -- on the order of 100 bits per sentence.
+
+8.2 Input representation for a sentence
+
+The input to PHONOI consists of a phonemic pronunciation for each word (i.e. as
+spoken in isolation), lexical stress pattern, and syntactic information concerning
+part of speech and phrasal structure. The output from PHONO1 consists of a
+
+single string of symbols for each sentence.
+The symbol inventory used in PHONO1 and PHONO?2 is shown in Table 8-1,
+
+81
diff --git a/pages-txt/094.txt b/pages-txt/094.txt
new file mode 100644
index 0000000..9b99202
--- /dev/null
+++ b/pages-txt/094.txt
@@ -0,0 +1,59 @@
+From text to speech: The MITalk system
+
+and includes 55 phonetic segments, three stress markers, four types of boundary
+indicators, and five syntactic structure indicators. It will be necessary to augment
+the inventory of syntactic and semantic symbols in the future, but those listed in
+the table are all that seem to be needed for a first-order approximation to an accept-
+able prosodic pattern. An example of the use of some of these symbols to specify
+the utterance “The old man sat in a rocker” is provided in Figure 8-1.
+
+The old man sat in a rocker.
+SOUND1: DH ’'AH
+SOUND1: "OW LL DD
+SOUND1: MM ’'AE NN
+SOUND1l: SS 'AE TT
+SOUND1: TH NN
+
+SOUND1l: AX
+SOUND1l: RR 'AA KK * - ER
+SOUND1:
+
+SOUND1: <EOF>
+PHONOl: Function word: DH AH
+PHONOl: Content word: ‘OW LL DD
+PHONOl: Content word: MM ‘AE NN [End NOUN phrase]
+PHONOl: Content word: SS ’'AE TT
+PHONOl: Function word: IH NN
+PHONOl: Function word: AX
+PHONOl: Content word: RR ‘AA KK * - ER
+PHONOl: Punctuation:
+PHONQl: <EOF>
+PHONQOZ2: Function word: DH IY
+PHONO2: Content word: ‘OW LX DD
+PHONO2: Content word: MM ’‘AE NN [End NOUN phrase]
+PHONO2: Content word: SS ’AE DX
+PHONO2: Function word: IH NN
+PHONO2: Function word: AX
+PHONQO2: Content word: RR ‘AA KK * - ER
+PHONOZ2: Punctuation: .
+PHONO2: <EOF>
+
+Figure 8-1: Example of PHONO1 and PHONO?2 processing
+
+8.2.1 Phonemic inventory
+A traditional phonemic analysis of English is assumed, except for the special cases
+listed below:
+
+1. The diphthongs Ay, aw, ow, YU are considered to be single
+phonemes rather than, e.g., AY = AA+YY or AA+IY or AA+IH be-
+cause none of the two-phoneme alternatives result in particularly
+simple rules to describe durational behavior and formant trajectories.
+
+2. The affricates cH and JJ are considered to be single phonemes
+rather than, e.g., CH = TT+SH for the same reasons.
+
+3. Vowel+RR syllabic nuclei are treated internally as the special vowel
+nuclei IxR (“beer”), ExR (“bear”), axr (“bar”), oxr (“boar”), and
+UXR (“pure”).
+
+82
diff --git a/pages-txt/095.txt b/pages-txt/095.txt
new file mode 100644
index 0000000..efcbd1c
--- /dev/null
+++ b/pages-txt/095.txt
@@ -0,0 +1,52 @@
+The phonological component
+
+4. Words like “player” and “buyer” should be transcribed with two syll-
+ables, i.e., EY+ER and AY+ER.
+
+5. Syllabic consonants appear in words like “butter” BB ’AH TT ER
+(phonetically BB AH DX ER), “button” BB ‘AH TT EN, “bottle” BB
+‘AA TT EL, and “pop’em” PP ’AA PP EM.
+
+6. The dental flap (px), glottalized TT (TQ), and velarized LL (1X) are
+not really phonemes, but are allophones inserted in lexical forms by
+
+rules to be described.
+7. The pseudo-vowel axp is inserted between a plosive and a following
+
+pause in order to cause the plosive to be released.
+
+8.2.2 Lexical stress
+
+Each stressed vowel in the input to PHONOI is preceded by a stress symbol ( or
+"), where ‘ is primary lexical stress (reserved for vowels in open-class content
+words, only one 1-stress per word). The secondary lexical stress, ", is used in
+some content words (e.g. the first syllable of “demonstration’), in compounds (e.g.
+the second syllable of “baseball”), in the strongest syllable of polysyllabic function
+words (e.g. “until”), and for pronouns (excluding personal pronouns like “his”).
+
+8.2.3 Stress reduction in function words
+
+Content words such as nouns, adjectives, adverbs, and main verbs are expected to
+have one primary lexical stress in the input to PHONO1. Many (but not all)
+closed-class function words are reduced in stress in PHONOI1 so that they do not
+receive a pitch gesture associated with primary stress. For example, determiners,
+conjunctions, auxiliary verbs, and personal pronouns are reduced in stress.
+
+Each word of an utterance to be synthesized must be immediately preceded
+by a word boundary symbol. The distinction between content and function words
+is indicated by using c: and r:. Open-class words (nouns, verbs, adjectives, and
+adverbs) are content words; all others are function words. Later modules use this
+information to select plausible pause locations (between a content word and a
+function word) in long phrases. ‘
+
+8.2.4 Syntactic structure
+Syntactic structure symbols are important determiners of sentence stress, rthythm,
+
+and intonation. Syntactic structure symbols appear just before the word boundary
+symbol. Only one syntactic marker can appear at a given sentence position. The
+
+strongest syntactic boundary symbol is always used.
+An utterance must end with either a period “.” signaling a final fall in intona-
+tion, or a question mark “)?” signaling the intonation pattern appropriate for yes-no
+
+83
diff --git a/pages-txt/096.txt b/pages-txt/096.txt
new file mode 100644
index 0000000..c9e7927
--- /dev/null
+++ b/pages-txt/096.txt
@@ -0,0 +1,87 @@
+From text to speech: The MITalk system
+
+Table 8-1: Klatt symbols used in the synthesis modules
+
+2A Bob AE bat
+AX about aXrRbar
+EXRbear EY bait
+IY beet ow boat
+uw boot UXR poor
+EL bottle HH hat
+RR rent Rx fire
+EM keep’em  EN button
+DH that FF fin
+vV vat 22 200
+BB bet DD debt
+KK core Kp keen
+CH chin JJ gin
+axP Plosive release
+
+/ or 1 primary lexical stress
+
+Vowels
+AH but A0
+AY bite EH
+I8 bit IX
+OxR boar [0) 4
+YU beauty
+Sonorant Consonants
+
+HX the hurrah 1L
+
+WW wet WH
+Nasals
+
+MM met NN
+Fricatives
+
+SS sat SH
+
+ZH azure
+Plosives
+
+DX butter GG
+
+PP pet TT
+Affricates
+
+Pseudo-vowel
+
+Stress Symbols
+
+"or 2
+
+bought aw bout
+bet ER bird
+impunity IXRbeer
+boy UH book
+let Lx bill
+which YY yet
+net NG sing
+shin TH thin
+gore GP give
+ten TQ at Alan
+
+secondary lexical stress
+
+Word and Morpheme Boundaries
+
+syllable boundary (ignored)
+
+C: begin content word F:
+Syntactic Structure
+
+end of declarative utterance )?
+
+/ orthographic comma )N
+
+)P potential breath pause )C
+
+84
+
+morpheme boundary
+begin function word
+
+end of yes/no question
+end of noun phrase
+end of clause
diff --git a/pages-txt/097.txt b/pages-txt/097.txt
new file mode 100644
index 0000000..da1aff2
--- /dev/null
+++ b/pages-txt/097.txt
@@ -0,0 +1,48 @@
+The phonological component
+
+questions. If clauses are conjoined, a syntactic symbol is placed just before the
+conjunction. If a comma could be placed in the orthographic rendition of the
+desired utterance, then the syntactic comma symbol “,” should be inserted. Syn-
+tactic commas are treated as full clause boundaries in the rules; they are used to list
+a series of items and to otherwise break up larger units into chunks in order to
+facilitate perceptual processing.
+
+The end of a noun phrase is indicated by )N. Segments in the syllable prior
+to a syntactic boundary are lengthened. Based on the results of Carlson et al.
+(1979), an exception is suggested in that any )~ following a noun phrase that con-
+tains only one primary-stressed content word should be erased. The NP + VP is
+then spoken as a single phonological phrase with no internal phrase-final lengthen-
+ing and no fall-rise FO contour to set off the noun phrase from the verb phrase.
+
+8.3 Comparison between ideal synthesis input and system performance
+An example of the output of the analysis routines of MITalk is presented in Sec-
+tion 8.7 at the end of this chapter. Examples where the analysis routines made an
+“error” are underlined in Section 8.7, and the seriousness of the error is indicated
+by a footnote for those errors deemed detrimental to perception. The word “error”
+is put in quotation marks to emphasize that an error made by an analysis routine
+need not be an error in some abstract linguistic sense, but only an error in the sense
+that the symbol is not the one that is desired by the synthesis routines.
+
+There are over 200 words in the sample text of Section 8.7 and over 1000
+phonetic segments.
+
+8.3.1 Phonetic transcription “errors”
+
+There are 25 phonetic transcription errors, all minor, most of which concern the
+difference between “I” and schwa. There do not seem to be serious problems with
+the letter-to-sound rules, in part because they are rarely activated, i.e., about five
+percent of the time. The rate at which phonetic errors are produced during MITalk
+analysis, about one percent (i.e. about one word in twenty is in error in running
+text), is quite good in comparison with text-to-speech systems that rely more
+heavily on letter-to-sound rules. Sentence intelligibility and comprehension scores
+are very high given the current analysis abilities.
+
+8.3.2 Stress “errors”
+
+There are 12 errors involving lexical stress assignment. Certain common words
+such as “might” and “each” should be marked with primary lexical stress in the
+lexicon because they almost always attract a certain amount of semantic focus, but
+they are not currently assigned stress. Other words, such as “prerecorded”, are not
+handled correctly by the morphological stress reassignment rules.
+
+85
diff --git a/pages-txt/098.txt b/pages-txt/098.txt
new file mode 100644
index 0000000..db71281
--- /dev/null
+++ b/pages-txt/098.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+8.3.3 Morpheme boundary problems
+
+The morpheme boundary symbol * is used in the synthesis rules to prevent words
+like back*ache from having a strongly aspirated medial kx. However, in a word
+such as applic*ation, a restructuring of syllable boundaries is desirable so that the
+medial Kx is strongly aspirated. In the present rule system it is not, since the * is
+in the way. Perhaps the morpheme boundary symbol should be deleted between a
+root and bound suffix (but not between two root morphemes). In other related
+cases, the boundary prevents desired resyllabification processes so that
+automatic*al*ly comes out as a six-syllable word, rather than the more normal 20
+DX - AX - MM 'AE DX - IH - KK LL IY.
+
+8.3.4 Syntactic “errors”
+
+There are a rather large number of syntactic "errors” involving the incorrect assign-
+ment of phrase and clause boundary locations. There are seven examples of a
+missing end-of-phrase )N symbol, one missing end-of-clause )c symbol, and 17
+cases where an end-of-clause symbol was incorrectly inserted between words with
+the intent to break up longer phrasal units. This had undesirable perceptual im-
+plications. The current algorithm intentionally adds extra clause boundary sym-
+bols in order to break up the synthesis into smaller groups of words set off by
+pauses and intonation breaks. These extra pauses were added because the com-
+puter seemed to be able to go for long stretches without “pausing for breath”. The
+trade-off between adding breath pauses to break the speech up into fewer process-
+ing chunks versus insertion of a break at a syntactically unacceptable place has yet
+to be optimized.
+
+8.3.5 Summary
+
+Of the analysis errors that were encountered in this admittedly difficult passage,
+most of the phonetic, stress, and phonological rule errors are easily correctable.
+However, only a few of the syntax errors can be fixed by straightforward debug-
+ging techniques. The most serious limitation of text-to-speech analysis routines
+seems to be in the area of automatic syntactic analysis. Still, the intelligibility and
+comprehension results to be presented in Chapter 13 indicate very encouraging
+overall system performance.
+
+8.4 Stress rules
+
+The phonological component assigns a feature Stress (value = 0 or 1) to each
+phonetic segment in the output string. The default value is O (unstressed). Vowels
+preceded by a stress symbol (°, ", or !) in the input are assigned a value of 1. Con-
+sonants preceding a stressed vowel are also assigned a value of 1 if they are in the
+
+same morpheme and if they form an acceptable word-initial consonant cluster.
+
+86
diff --git a/pages-txt/099.txt b/pages-txt/099.txt
new file mode 100644
index 0000000..fcbc613
--- /dev/null
+++ b/pages-txt/099.txt
@@ -0,0 +1,46 @@
+The phonological component
+
+The stress feature is one way of defining a syllable structure for each word.
+Stressed consonants are defined to be affiliated with the following vowel, while
+unstressed consonants are affiliated with a preceding vowel (or their affiliation
+does not matter to subsequent rules). Segmental stress is used in rules that deter-
+mine whether TT and pp are flapped, whether consonants and vowels are
+lengthened, whether voiceless plosives are strongly aspirated, and the degree of
+formant target undershoot.
+
+For example, consider the consonants preceding the stressed vowel in the
+words “Atlantic” and “atrocious”. In the first word, the TT is realized as a glottal
+stop (or glottalized alveolar stop). In the second word, the TT is a strongly
+aspirated full-alveolar stop. The distinction is maintained in the program by as-
+signing segmental stress to both the TT and the RrR in “atrocious” (because “tr” is
+a legal word-initial cluster), while assigning the segmental stress feature only to “1”
+in “Atlantic” (because “tl” is not a legal word-initial cluster). Given a proper for-
+mulation of the flapping rule and glottalized-t rule described below, this stress as-
+signment ensures the selection of the appropriate allophone of TT.
+
+8.5 Rules of segmental phonology
+
+There are currently very few phonological rules of a segmental nature in the
+program. A number of rules that are sometimes attributed by linguists to the
+phonological component (e.g. palatalization) are realized in the phonetic com-
+ponent described in Chapter 11 because they involve graded phenomena (e.g. the
+ss of “fish soup” is partially palatalized, but not identical to su). The segmental
+phonological rules that are described below are extremely important. They are not
+“sloppy speech” rules, but rather rules that aid the listener in hypothesizing the
+locations of word and phrase boundaries. For example, the second rule listed
+below ensures that a word-final TT is not perceived as a part of the next word by
+inserting simultaneous glottalization to inhibit oral pressure buildup during
+closure, and thus attenuate any release burst.
+
+1. Substitute a postvocalic velarized allophone 1x for 1L if the LL is
+preceded by a vowel and followed by anything except a stressed
+vowel in the sime word.
+
+2. Replace TT or DD by the alveolar flap Dx within words and across
+word boundaries (but not across phrase and clause boundaries) if the
+plosive is followed by a non-primary-stressed vowel and preceded by
+
+a nonnasal sonorant. Examples: “butter”, “ladder”, “sat about”.
+3. A word-final TT preceded by a sonorant is replaced by the glottal-
+
+87
diff --git a/pages-txt/100.txt b/pages-txt/100.txt
new file mode 100644
index 0000000..45ac407
--- /dev/null
+++ b/pages-txt/100.txt
@@ -0,0 +1,52 @@
+From text to speech: The MITalk system
+
+ized dental stop TQ (i.e. has a glottal release rather than a t-burst) if
+
+the next word starts with a stressed sonorant (unless there is a clause
+boundary between the words, in which case the TT is released into a
+
+pause). Examples: “that one”, “Mat ran”.
+
+4. A voiceless plosive is not released if the next phonetic segment is
+another voiceless plosive within the same clause.
+
+5. A glottal stop is inserted before a word-initial stressed vowel if the
+preceding segment is syllabic (and not a determiner), or if the
+preceding segment is a voiced nonplosive and there is an intervening
+phrase boundary. Example: “Liz eats”.
+
+6. The word “the” is pronounced DH 1IY if the next word starts with a
+vowel.
+
+8.5.1 An example
+
+If the six rules of segmental phonology are applied to the sentence shown in Figure
+8-1, three allophonic changes are made. The sixth rule replaces the schwa by 1Y
+in the word “the”. The first rule replaces the phoneme LL by a postvocalic al-
+lophone in the word “old”. Finally, the second rule replaces the TT in “sat” by an
+alveolar flap px. The string of symbols in the lower portion of Figure 8-1 is thus a
+broad phonetic transcription of the utterance to be synthesized. As the output of
+the phonological component, it serves as the input to the prosodic component
+
+PROSOD that is described in Chapter 9.
+
+8.6 Pauses
+
+Pauses are often used in speech production to mark major syntactic boundaries.
+Both pauses and prepausal lengthening are important to guide the listener’s percep-
+tion of the underlying syntactic structure of a sentence (Klatt, 1976b). A system of
+rules has been worked out for determining the locations of pauses in the synthesis,
+and the duration of each kind of pause.
+
+Pauses of 800 msec, sufficient for a real speaker to take a breath, are intro-
+duced after any sentence of more than five words. A longer pause of 1200 msec
+appears at the end of paragraphs. Brief sentence-internal pauses (400 msec) are
+triggered by punctuation marks contained in the text, or are inserted by PHONOL1
+at detectable clause boundaries.
+
+It is desirable to insert another kind of pause in certain sentence-internal posi-
+tions of very long sentences because of the talker’s limited lung volume. An algo-
+rithm has been developed for locating such pauses that is based on the number of
+syllables on either side of the potential sentence-internal breath pause, and the
+
+88
diff --git a/pages-txt/101.txt b/pages-txt/101.txt
new file mode 100644
index 0000000..78f3b04
--- /dev/null
+++ b/pages-txt/101.txt
@@ -0,0 +1,56 @@
+The phonological component
+
+strength of various boundaries in the vicinity of a desired break. If necessary, such
+pauses may be inserted between a content and function word, even if no phrase
+boundary has been detected by syntactic analysis routines.
+
+8.7 Evaluation of the analysis modules
+The following text was input to the analysis modules of MITalk:
+
+“This recording is a demonstration of speech synthesis by rule
+
+and automatic text-to-speech conversion.
+
+Applications for synthetic speech output fall into four broad
+categories: those applications that require (1) a single word response
+(e.g. Speak and Spell), (2) a limited set of messages with a rigid syn-
+tactic framework (e.g. telephone number information), (3) a large
+vocabulary with general English syntax (e.g. teaching machine
+lessons), and (4) fully general English text to speech (e.g. for a reading
+machine for the blind).
+
+Prerecorded messages work well for single word response ap-
+plications, but an increasing knowledge of the acoustic-phonetic
+characteristics of speech, of phonology, and of syntax is required for
+satisfactory synthesis of general English. In order to generate a par-
+ticular utterance, one must specify a phonemic representation for each
+word, a stress pattern for each word, certain aspects of the syntactic
+structure of the sentence, such as the locations of phrase and clause
+boundaries, and the locations of any words that are to receive semantic
+
+focus.
+
+This information could be typed into a computer terminal, as was
+done in this case, or the information might be generated automatically
+from a deep-structure representation of the concept to be expressed.
+The speech that you have just heard was produced in June 1979 by the
+synthesis-by-rule portions of the MITalk text-to-speech system that is
+being developed at MIT”.
+
+The output from PHONO2 is given below. Erroneous segments are under-
+lined and the corrections are given as subscripts. A null subscript () means that
+the segment should be deleted.
+
+F: DH "IH SS C: RR IH,y KK 'OXR DD * IH NG )C,,! F: IH 22 F:
+AX C: DD "EH MM AX NN * SS TT RR 'EY SH AX NN F: AX VV C:
+SS PP ‘IY CH SH C: SS 'IH NN TH AX SS AX SS F: BB AY C: RR 'UW IX
+)C F: AE NN C: "AO DX AX MM ‘AE DX IH KK C: TT 'EH KK SS TIT F:
+TT AX C: SS PP YIY CH SH C: KK AX NN VV 'ER 2H AX NN . C:
+"AE PP LL IH,, KK *z? 'EY SH AX NN * zZ F: FF OXR C:
+SS IH NN TH 'EH DX IH KK AXPyz )Cy C: SS PP ‘IY CH SH C:
+
+1To0 many extra “) c” pauses added.
+
+2Morph boundary between root and bound morph has detrimental effect.
+
+89
diff --git a/pages-txt/102.txt b/pages-txt/102.txt
new file mode 100644
index 0000000..a50e061
--- /dev/null
+++ b/pages-txt/102.txt
@@ -0,0 +1,62 @@
+From text to speech: The MITalk system
+
+‘AW TT PP "UH TT ,y C: FF ‘A0 LX F: "IH NN TT UW C: FF 'OXR C:
+
+BB RR AO DD C: KP 'AE DX IHpy GG "RO RRugyg IY * 2Z , F:
+
+DH "OW 22 C: "AE PP LL IH,, KK *p ‘EY SH AX NN * ZZ )N F:
+
+DH "AE TT C: RR IH,, KK W# 'AY ER , C: WW 'AH NN , F: AX C:
+
+SS IH NG GG AX LX C: WW 'ER DD C: RR IH SS PP 'AA NN SS , C: 'IY
+C: JJ ZH "IY .,y C: SS PP 'IY KK AXPyx )Cy F: AE NN C:
+
+SSPP 'EHLX, , , C: TT 'UW , F: AX C: LL ‘IH MM AX TTpy * IH DD
+C: SS 'EH DX F: AX VV C: MM ‘EH SS IH,y JJ 2H * IH;, 2Z )C)y F:
+WWw IH TH F: AX C: RR ‘IH JJ 2H IH DD C:
+
+SS IH NN TT ‘AE KK TT IH KK C: FF RR EY MM * WW "ER KK AXP , C:
+*IY C: JJ ZH *IY ,y C: TT 'EH LX AX * FF !OW.oy NN! C:
+
+NN ‘AH MM BB ER C: "IH NN FF ER MM 'EY SH AX NN , , , C:
+
+TH RR ‘IY , F: AX C: LL ‘AXR JJ ZH C:
+
+VV OW KP ‘AE BB YY AX LL "EH RRugyp IY ,y F: WW IH TH C:
+
+JJ 2H 'EH NN RR AX LX C: ‘IH NG GG LL IH;, SH C:
+
+SS ‘IH NN TT "AE KK SS , C: 'IY C: JJ 2H 'IY ,y C:
+
+TT IY CH SH * IH NG C: MM AX SH 'IY NN C: LL ‘EH SS ENpy gy * 22
+, » F: AE NN DD AXP , C: FF ‘OXR , C: FF 'UH LX IY C:
+
+JJ ZH 'EH NN RR AX LX C: IH NG GG LL IH SH C: TT 'EH KK SS TT F:
+TT AXyy C: SS PP “IY CH SH , C: 'IY C: JJ 2H 'IY ,y F: FF OXR F:
+C: RR "IY DD * IH NG C: MM AX SH 'IY NN F: FF OXR F: DH AX C:
+LL 'AY NN DD AXP ,5% . C:
+
+RR IY,;y * RR IHpy KK 'OXRugyg DD * IH;y DD C:
+
+'EH SS IHpy JJ ZH * IH;y 22 ,y C: WW ‘ER KK C: WW 'EH LX )Cy F:
+OXR C: SS ’IH NG GG AX LXp; C: WW 'ER DD C:
+
+IH SS PP 'AA NN SS C: "AE PP LL IH,y KK *, 'EY SH AX NN * ZZ ,
+BB AH DX F: AE NN C: IH NN KK RR ‘IY SS * IH NG C:
+
+*AA LX IH JJ ZH )Cyx F: AX VV F: DH IY C: AX KK 'UW SS TT IH KK
+FF AX NN 'EH DX IH KK C:
+
+"AE RRupyg IHax KK TT AX RR ‘IH SS TT IH KK * SS F: AX VV C:
+SS PP 'IY CH SH , F: AX VV C: FF "OW,, NN * ’AA ILX AX JJ 2H IY ,
+F: AE NN F: AX VV C: SS 'IH NN TT "AE KK SS )C F: IH 2Z C:
+
+RR IH KK WW 'AY ER * DD F: FF OXR C:
+
+ROETEIAERR B K
+
+1Two primary stresses in one word.
+
+2The extra comma at sentence end results in no terminal fall.
+
+90
diff --git a/pages-txt/103.txt b/pages-txt/103.txt
new file mode 100644
index 0000000..1d86eac
--- /dev/null
+++ b/pages-txt/103.txt
@@ -0,0 +1,55 @@
+The phonological component
+
+SS "AE DX AX SS FF 'AE KK TT *yz ER * IY C:
+
+SS 'IH NN TH AX SS AX SS )C F: AX VV C: JJ ZH 'EH NN RR AX ILX C:
+*IH NG GG LL IH;, SH . F: IH NN C: 'OXR DX ER )Cyx F: TT AXy, C:
+JJ ZH 'EH NN ER * "EY DX F: AX C: PP AXR TT 'IH KK YY AX LX ER C:
+‘AH DX ER * AX NN SS , C: WW 'AH NN F: MM AH SS TT C:
+
+SS PP ‘EH SS AX FF "AY F: AX C: FF OWnpy NN 'IY MM IH KK C:
+
+RR "EH PP RR IH,, 2Z "EH NN TT *x 'EY SH AX NN )Cyx F: FF OXR F:
+"IY,;y CH SH! C: WW 'ER DD AXP , F: AX C: SS TT RR 'EH SS C:
+
+PP ’AE DX ER NN F: FF OXR F: "IY,;y CH SH C: WW 'ER DD AXP , C:
+SS ‘ER DXpq EN C: ‘AE SS PP "EH KK TT * SS F: AX VV F: DH AX C:
+SS IH NN TT ‘AE KK TT IH KK C: SS TT RR 'AH KK CH SH ER )Cgy F:
+AX VV F: DH AX C: SS 'EH NN TT ENpy gy SS + F: SS "AH CH SH F:
+AE 2Z F: DH AX C: LL "OW KK *p 'EY SH AX NN * ZZ F: AX VV C:
+
+FF RR 'EY 2Z )Cyx F: AE NN C: KK LL ‘AO 2Z C:
+
+BB ‘AW NN DD RR IY * 22 , F: AE NN F: DH AX C:
+
+LL "OW KK *z 'EY SH AX NN * 22 F: AX VV F: "EH NN IY C:
+
+WW ER DD * ZZ )C,y F: DH "AE DX F: AXR F: TT AXyy C:
+
+RR AX SS ‘IY VV C: SS IH MM ‘AE NN TT IH KK C: FF 'OW KK AX SS
+F: DH "IH SS C: "IH NN FF ER MM 'EY SH AX NN )C,y F: KK UH DD F:
+BB IY C: TT ‘AY PP * TQ F: "IH NN TT UW F: AX C:
+
+KK AX MM PP ‘YU TT *z ER C: TT ‘ER MM AX NN *5 AX LX , F: AE 22
+F: Wi AH 22 C: DD AH NN F: IH NN F: DH "IH SS C: KP 'EY S§ , F:
+OXR F: DH IY C: "IH NN FF ER MM 'EY SH AX NN )C,y F: MM AY,,y TT?
+F: BB IY C: JJ 2H 'EH NN ER * "EY TT * IH TQ C:
+
+"AO DX AX MM ‘AE DX IH,, KK *p AXy LXg *p LL IY3 )C F:
+
+FF RR AX MM F: AX C: DD 'IY PP C: SS TT RR 'AH KK CH SH ER C:
+
+RR "EH PP RR IH 2% "EH NN TT *x ‘EY SH AX NN )Cy F: AX VV F:
+DH AX C: KK 'AA NN SS "EH PP TT )N F: TT AXyy F: BB IY C:
+IH KK SS PP RR 'EH SS * TT AXP . F: DH AX C: SS PP ’IY CH SH )N
+
+F: DH "AE DX F: YU F: HX AE VV C: JJ ZH 'AH SS TQ C:
+HH 'ER DD AXP )C,y F: WW AH 2Z C: PP RR AX DD ‘UW SS * TT F: IH NN
+C: JJ 2H 'UW NN ,¢ C: NN AY NN TT 'IY¥.;, NN
+
+1“Each” should be intrinsically stressed.
+2“Might” should be intrinsically stressed.
+
+3DECOMP yields poor suffix expansion.
+
+91
diff --git a/pages-txt/104.txt b/pages-txt/104.txt
new file mode 100644
index 0000000..3e9cf15
--- /dev/null
+++ b/pages-txt/104.txt
@@ -0,0 +1,14 @@
+From text to speech: The MITalk system
+
+C: SS 'EH;, VV AX NN TT IY! C: NN ‘AY NN )C F: BB AY F: DH AX C:
+SS IH NN TH AX SS AX SS F: BB AY C: RR 'UW LX C:
+
+PP 'OXR SH AX NN * 2Z )C,y F: AX VV F: DH AX C: MM ’'AY TT "AO KK
+C: TT 'EH KK SS TT F: TT AXyy C: SS PP 'IY CH SH C:
+
+SS ‘IH SS TT AX MM )C,y F: DH "AE DX F: IH 2Z F: BB "IY IH NG C:
+DD IH,, VV ‘EH LX AX PP * TT F: AE TQ C: ‘EH MM C: 'AY C: TT 'IY
+
+ITt is hard to get stress right in number sequences.
+
+92
diff --git a/pages-txt/105.txt b/pages-txt/105.txt
new file mode 100644
index 0000000..4b6e6e4
--- /dev/null
+++ b/pages-txt/105.txt
@@ -0,0 +1,42 @@
+9
+
+The prosodic component
+
+9.1 Overview
+
+The sentence representation produced by the phonological component PHONO2
+serves as input to the prosodic component PROSOD that is to be described in this
+chapter. An example of the input to the prosodic component and the output
+generated by the prosodic rules is shown in Figure 9-1. The output consists of a
+string of phonetic segments, with each segment assigned a stress feature and a
+duration in msec. The fundamental frequency targets which appear in the
+PROSOD output listing are generated by an obsolete algorithm and are discarded
+by FOTARG which then generates the proper FO targets.
+
+9.2 Segmental durations
+
+In a review of the factors that influence segmental durations in spoken English sen-
+tences (Klatt, 1976b and references cited therein), it was concluded that only a few
+of the many rule-governed durational changes are large enough to be perceptually
+discriminable. The goal of the rule system described below and in Klatt (1979b) is
+to characterize these perceptually important first-order effects.
+
+The durational definitions that have been adopted include the closure for a
+stop (any burst and aspiration at release are assumed to be a part of the following
+segment). For fricatives, the duration corresponds to the interval of visible frica-
+tion noise (or to changes in the voicing source if no frication is visible). For
+sonorant sequences, the segmental boundary is defined to be the half-way point in
+the formant transition for that formant having the greatest extent of transition.
+These definitions lead to a convenient and largely reproducible measurement pro-
+cedure, but the physiological and perceptual validity of these boundaries have not
+been established.
+
+Each segment is assigned a duration by a set of rules presented in detail
+below. The rules are intended to match observed durations for a single speaker
+(DHK) reading paragraph-length materials. The rules operate within the
+framework of a model of durational behavior which states that: 1) each rule tries to
+effect a percentage increase or decrease in the duration of the segment, but 2) seg-
+ments cannot be compressed shorter than a certain minimum duration (Klatt,
+1973). The model is summarized by the formula:
+
+93
diff --git a/pages-txt/106.txt b/pages-txt/106.txt
new file mode 100644
index 0000000..b60cbaa
--- /dev/null
+++ b/pages-txt/106.txt
@@ -0,0 +1,87 @@
+From text to speech: The MITalk system
+
+The old man sat in a rocker.
+PHONO2: Function word: DH IY
+PHONO2: Content word: 'OW LX DD
+
+PHONO2: Content word: MM ’'AE NN [End NOUN phrase]
+PHONO2: Content word: SS '"AE DX
+PHONO2: Function word: IH NN
+PHONO2: Function word: AX
+PHONQO2: Content word: RR ‘AA KK * - ER
+PHONO2: Punctuation:
+PHONQ2: <EOF>
+
+PROSOD: (Silence] 30ms. 133.4Hz.
+
+PROSOD: Function word:
+
+PROSOD: DH 50ms. 123.4Hz.
+
+PROSOD: IY 105ms. 131.4Hz.
+
+PROSOD: Content word:
+
+PROSOD: "OW 170ms. 174.5Hz. Stressed
+PROSOD: LX 75ms. 151.0Hz.
+
+PROSOD : DD 50ms. 146.0Hz.
+
+PROSOD: Content word:
+
+PROSOD: MM 70ms. 151.0Hz. Stressed
+PROSOD: 'AE 210ms. 157.0Hz. Stressed
+PROSOD: NN 55ms. 117.9Hz.
+
+PROSOD: [End NOUN phrase]
+
+PROSOD: Content word:
+
+PROSOD: SS 100ms. 122.9Hz. Stressed
+PROSOD: 'AE 175ms. 153.9Hz. Stressed
+PROSOD: DX 20ms. 140.1Hz.
+
+PROSOD: Function word:
+
+PROSQOD: IH 55ms. 148.1Hz.
+
+PROSOD: NN 50ms. 142.5Hz.
+
+PROSOD: Function word:
+
+PROSOD: AX 60ms. 142.5Hz.
+
+PROSOD: Content word:
+
+PROSOD: RR 80ms. 140.2Hz. Stressed
+PROSOD: "AA 160ms. 146.2Hz. Stressed
+PROSOD: KK 65ms. 113.1Hz.
+
+PROSOD: *
+
+PROSOD: -
+
+PROSOD: ER 170ms. 108.1Hz.
+
+PROSOD: Punctuation: .
+
+PROSOD: [Silence] 400ms. 111.2Hz.
+
+PROSOD: [End sentence]
+
+PROSOD: <EOF>
+
+Figure 9-1: Example of the processing performed by PROSOD
+
+DUR=((INHDUR-MINDUR)xPRCNT)/100+MINDUR (1)
+
+where INHDUR is the inherent duration of a segment in msec, MINDUR is the
+minimum duration of a segment in msec, and PRCNT is the percentage shortening
+determined by applying rules 1 to 10 below. The program begins by obtaining
+values for INHDUR and MINDUR for the current segment from Table 9-1, and by
+setting PRCNT to 100. The inherent duration has no special status other than a
+starting point for rule application; it is roughly the duration to be expected in non-
+sense CVCs spoken in the carrier phrase “Say bVb again” or “Say Cab again”.
+The following ten rules are then applied, where each rule modifies the PRCNT
+
+94
diff --git a/pages-txt/107.txt b/pages-txt/107.txt
new file mode 100644
index 0000000..da5ea97
--- /dev/null
+++ b/pages-txt/107.txt
@@ -0,0 +1,51 @@
+The prosodic component
+
+value obtained from the previous applicable rules by an amount PRCNT]1, accord-
+ing to the equation:
+
+PRCNT=(PRCNTXPRCNT1)/100 )
+
+The duration of the segment is then computed by inserting the final value for
+PRCNT into Equation 1; and, finally, Rule 11 is applied. Justification for the
+presence of each rule is given in the references cited below, but the detailed for-
+mulation of a rule involved considerable trial-and-error effort to match the rule
+output against a large body of hand-segmented and labeled spectrograms of
+paragraphs read by speaker DHK.
+
+1. Pause insertion rule:
+Insert a 200-msec pause before each sentence-internal main clause and at
+boundaries delimited by a syntactic comma, but not before relative
+clauses (Goldman-Eisler, 1968; Cooper et al., 1978). The “(R” symbol
+
+functions like a “)N” in the duration rules.
+
+2. Clause-final lengthening:
+The vowel or syllabié consonant in the syllable just before a pause is
+lengthened by PRCNT1=140 (Gaitenby, 1965; Lindblom and Rapp,
+1973). Any consonants between this vowel and the pause are also
+lengthened by PRCNT1=140 (Oller, 1973; Klatt, 1975).
+
+3. Non-phrase-final shortening:
+Syllabic segments (vowels and syllabic consonants) are shortened by
+
+PRCNT1=60 if not in a phrase-final syllable (Lindblom and Rapp, 1973;
+Klatt, 1975). A phrase-final postvocalic liquid or nasal is lengthened by
+
+PRCNT1=140.
+
+4, Non-word-final shortening:
+Syllabic segments are shortened by PRCNT1=85 if not in a word-final
+
+syllable (Lindblom and Rapp, 1973; Oller, 1973).
+
+3. Polysyllabic shortening:
+Syllabic segments in a polysyllabic word are shortened by PRCNT1=80
+
+(Lindblom and Rapp, 1973; Lehiste, 1975a).
+
+6. Non-initial-consonant shortening:
+Consonants in non-word-initial position are shortened by PRCNT1=85
+
+(Klatt, 1974; Umeda, 1977).
+
+95
diff --git a/pages-txt/108.txt b/pages-txt/108.txt
new file mode 100644
index 0000000..79ea9fa
--- /dev/null
+++ b/pages-txt/108.txt
@@ -0,0 +1,143 @@
+From text to speech: The MITalk system
+
+Table 9-1: Minimum and inherent durations in msec for each segment type
+
+96
+
+AO
+
+ER
+
+IH
+1Y
+
+oY
+UXR
+
+EL
+LL
+RX
+YY
+
+EM
+NN
+
+DH
+SH
+22
+
+BB
+GG
+
+TQ
+
+CH
+
+100
+100
+
+120
+80
+
+55
+150
+110
+
+110
+40
+70
+40
+
+110
+
+50
+
+30
+80
+40
+
+60
+60
+40
+50
+
+50
+
+240
+240
+260
+180
+135
+155
+280
+230
+
+260
+80
+80
+80
+
+170
+60
+
+50
+105
+75
+
+85
+80
+80
+75
+
+70
+
+Vowels
+aE 80 230 aH 60
+aw 100 260 ax 60
+ay 150 250 exs 70
+ExrR 130 270 ey 100
+ix 60 110 ixr 100
+ow 80 220 oxr 130
+vk 60 160 w70
+yu 150 230
+Sonorant Consonants
+B 20 80 HX 25
+x 70 90 RR 30
+ww 60 80 wH 60
+Nasals
+EN 100 170 MM 60
+N 60 95
+Fricatives
+rr 80 100 ss 60
+TH 60 90 vw 40
+zH 40 70
+Plosives
+pp S50 75 px 20
+ce 40 80 KK 60
+pp 50 90 T 50
+Affricates
+g 50 70
+
+Pseudo-vowel
+axp 70 70
+
+140
+120
+150
+190
+230
+240
+210
+
+70
+80
+70
+
+70
+
+105
+60
+
+20
+80
+75
diff --git a/pages-txt/109.txt b/pages-txt/109.txt
new file mode 100644
index 0000000..41d46bd
--- /dev/null
+++ b/pages-txt/109.txt
@@ -0,0 +1,50 @@
+10.
+
+The prosodic component
+
+Unstressed shortening:
+
+Unstressed segments are half-again more compressible than stressed seg-
+ments (i.e. sset MINDUR=MINDUR/2). Then both unstressed and 2-
+
+stressed segments are shortened by a factor PRCNT1 that is tabulated
+below for each type of segment. The result is that segments assigned
+
+secondary stress are shortened relative to 1-stress, but not as much as un-
+stressed segments (Umeda, 1975, 1977, Lehiste, 1975a).
+
+Context PRCNT]1 for -stress and 2-stress
+syllabic (word-medial syllable) 50
+syllabic (others) 70
+prevocalic liquid or glide 10
+all others 70
+
+Lengthening for emphasis:
+An emphasized vowel is lengthened by PRCNT1=140 percent (Bolinger,
+1972; Carlson and Granstrom, 1973; Umeda, 1975).
+
+Postvocalic context of vowels:
+a) The influence of a postvocalic consonant (in the same
+word) on the duration of a vowel is given below (House
+and Fairbanks, 1953; Peterson and Lehiste, 1960). In a
+postvocalic sonorant-obstruent cluster, the obstruent deter-
+mines the effect on the vowel (and on the sonorant
+
+consonant).
+Context PRCNT1
+open syllable, word-final 120
+before a voiced fricative | 160
+before a voiced plosive 120
+before a nasal 85
+before a voiceless plosive 70
+before all others 100
+
+b) The effects are greatest at phrase and clause boundaries: if
+the vowel is non-phrase-final, change PRCNT1 to be closer
+to 100, according to the formula PRCNT1 = 70 +
+0.3*PRCNT1 (Klatt, 1975).
+
+Shortening in clusters:
+Segments are shortened in consonant-consonant sequences (disregarding
+
+97
diff --git a/pages-txt/110.txt b/pages-txt/110.txt
new file mode 100644
index 0000000..d6c0889
--- /dev/null
+++ b/pages-txt/110.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+word boundaries, but not across phrase boundaries) (Klatt, 1973), and in
+
+vowel-vowel sequences.
+
+Context PRCNT1
+vowel followed by a vowel 120
+vowel preceded by a vowel 70
+consonant surrounded by consonants 50
+consonant preceded by a consonant 70
+consonant followed by a consonant 70
+11. Lengthening due to plosive aspiration:
+
+A 1-stressed or 2-stressed vowel or sonorant preceded by a voiceless
+
+plosive is lengthened by 25 msec (Peterson and Lehiste, 1960).
+
+When the rules are applied to the RR of “rocker” in Figure 9-1, the second
+rule sets PRCNT to 140, the fifth rule reduces PRCNT to 112, the seventh rule
+reduces MINDUR to 30 msec and PRCNT to 78.4, and the ninth rule increases
+PRCNT to 94. Then INHDUR, MINDUR, and PRCNT are inserted in Equation 1,
+and the resulting duration is rounded up to the nearest 5 msec to obtain the value of
+175 msec shown in the lower part of Figure 9-1.
+
+The resulting durations are determined in part by a variable that controls the
+nominal speaking rate SPRATE which can be set to any number between 60 and
+300 words per minute. The default value is 180 words per minute. At rates slower
+than 150 wpm, a short pause is inserted between a content word and a following
+function word. (At a normal speaking rate, brief pauses are inserted only at the
+ends of clauses.) Individual segments are lengthened or shortened slightly depend-
+ing on speaking rate, but most of the rate change is realized by manipulating pause
+durations (Goldman-Eisler, 1968).
+
+The present rules are only a crude approximation to many of the durational
+phenomena seen in sentences (e.g. consonant interactions in clusters) and the rules
+completely ignore other factors. Nevertheless, to a first approximation, the rules
+capture a great deal of the systematic variation in segmental durations for speaker
+DHK. When compared with spectrograms of new paragraphs read by this speaker,
+the rule system produces segmental durations that differ from measured durations
+by a standard deviation of 17 msec (excluding the prediction of pause durations).
+The rules account for 84 percent of the observed total variance in segmental dura-
+tions. Seventeen msec is generally less than the just noticeable difference for a
+single change to segmental duration in sentence materials (Klatt, 1976a).
+
+A perceptual evaluation of the performance of the rule system is discussed by
+Carlson et al. (1979). The perceptual results are encouraging in that both natural-
+
+98
diff --git a/pages-txt/111.txt b/pages-txt/111.txt
new file mode 100644
index 0000000..cb9532f
--- /dev/null
+++ b/pages-txt/111.txt
@@ -0,0 +1,32 @@
+The prosodic component
+
+ness and intelligibility ratings of sentences synthesized by these rules are very
+similar to ratings of the same sentences synthesized using durations obtained from
+a natural recording.
+
+Complete durational rule systems exist for English (Coker et al., 1973) and
+Swedish (Carlson and Granstrom, 1976). (We have borrowed heavily from the
+elegant rule system of Lindblom and Rapp that was augmented and implemented
+by Carlson and Granstrom.) Partial rule systems have also been proposed for
+vowels (Umeda, 1975; Liberman, 1977) and for consonants (Umeda, 1977). The
+rules contained in these systems are similar (not surprisingly), but there are many
+ways to generalize from the available data. For example, Coker et al. (1973) rely
+heavily on multiple stress levels conditioned by syntactic category (verbs have less
+stress than nouns) and conditioned by word frequency (common words and words
+that are repeated in a discourse are reduced in stress). Liberman (1977) includes
+rules related to rhythm and isochronous principles. Neither of these kinds of rules
+are incorporated explicitly in our system, but we do achieve partial isochrony
+through rules that shorten unstressed syllables and consonant clusters (see Carlson
+et al., 1979). For quantification, we capture durational differences between nouns
+and verbs by phrase-final lengthening, and we permit the use of the emphasis sym-
+bol “!* in the input to capture word frequency and discourse expectancy effects in
+a binary fashion. |
+
+Therefore, it may never be possible to make absolute judgements concerning
+which rule system is theoretically correct. Effort should rather be directed at sys-
+tematic optimization of a particular rule system, e.g., one that starts with a linguis-
+tically motivated framework for how to represent an input sentence and draws on
+both speech production data and perceptual constraints to formulate a simple set of
+rules as a starting point.
+
+99
diff --git a/pages-txt/112.txt b/pages-txt/112.txt
new file mode 100644
index 0000000..0b83c97
--- /dev/null
+++ b/pages-txt/112.txt
@@ -0,0 +1,39 @@
+10
+
+The fundamental frequency generator
+
+10.1 Overview
+
+An important component in the generation of natural-sounding speech is the fun-
+damental frequency of the voicing source. Such attributes as syntactic structure,
+emphasis, and sentence type can be partially signaled by the fundamental fre-
+quency (FO) contour as well as by duration and amplitude information. In the FO
+algorithm used with the text-to-speech system, information from both syntactic and
+phonologic components is used. It utilizes the phrase structure of each sentence as
+analyzed by the parser to determine declination lines, to calculate the amount of
+excursion from the declination line through each phrase, and to insert continuation
+rises. Lexical stress marks and syllable division are used to determine the location
+of FO peaks, and parts of speech provide information needed to determine the rela-
+tive height of the peaks. Phonemic data provide the information needed to deter-
+mine segmental influences on fundamental frequency. These influences produce
+an active variation in peaks and valleys, thus yielding a lively contour
+(O’Shaughnessy, 1976). |
+
+The algorithm currently in use produces two FO “target values” for each
+phonetic segment, one to be used at onset and one as a mid-value. This is an adap-
+tation of the original O’Shaughnessy algorithm which produces a value every 5
+msec. The production of target values allows a more uniform treatment of
+parameters, since interpolation for FO hereafter may be handled in the same way as
+for most of the other parameters. It is also possible to take advantage of a lower
+data rate since one or two values per segment replace the previous necessity for
+one value every 5 msec. The rises and falls which are calculated for each segment
+are used to specify the target values, the peak point at either the left or right bound-
+ary of stressed vowels in content words, and the midpoint target value for other
+segments. Other midpoint values are determined by interpolation.
+
+The fundamental frequency generation program accepts syntactic information
+from PARSER (discussed in Chapter 4) and phonemic information from PROSOD
+(discussed in Chapter 9) in the form of a PROSOD output file. Its output is an
+augmented PROSOD file containing the two target values for each segment.
+
+100
diff --git a/pages-txt/113.txt b/pages-txt/113.txt
new file mode 100644
index 0000000..7a9bb6f
--- /dev/null
+++ b/pages-txt/113.txt
@@ -0,0 +1,49 @@
+The fundamental frequency generator
+
+10.2 Input
+
+The output file from the PARSER provides phrase group information and the part
+of speech of individual words to the FO algorithm. The phrase groups which are
+recognized are noun phrases, prepositional phrases, verb phrases, and verbal
+groups. The parts of speech are grouped so as to be more useful in determining
+how they affect the FO contour. The word classes listed in Table 10-1 below are
+given in order of their potential to affect the contour. Those parts of speech in
+parentheses are provided by the FO algorithm, but are not used directly in the lex-
+icon. A reflexive pronoun, for example, is listed in the lexicon as having the part
+of speech PRONOUN and the feature REFLEXIVE. It is passed to the FO algo-
+rithm simply as a PRONOUN.
+
+Table 10-1: Relative peak levels of words according to their parts of speech
+
+Level Part of speech
+
+0 article
+
+1  conjunction, relative pronoun
+
+2  preposition, auxiliary verb, (unstressable modal, vocative)
+3  personal pronoun
+
+6  verb, demonstrative pronoun
+
+7  noun, adjective, adverb, contraction
+8  (reflexive pronoun)
+
+9  stressable modal
+
+10  quantifier
+
+11  interrogative adjectives
+12 (negative element)
+
+14  (sentential adverb)
+
+There are nine levels which are actually distinguished from one another.
+Those listed beginning with VERB, i.e., Level 6, are considered important enough
+to produce a peak in the contour. Words with these “important” parts of speech
+are referred to as “content” words. The relative height of the peak depends upon
+the order relation. The features “content” and “function” are also used in another
+module, PHONOI1, to label types of words. All “content” words in PHONOL1 are
+also “content” words in this algorithm. However, certain parts of speech which
+
+101
diff --git a/pages-txt/114.txt b/pages-txt/114.txt
new file mode 100644
index 0000000..9e4fce6
--- /dev/null
+++ b/pages-txt/114.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+were given the label “function” are elevated to “content” importance in the FO al-
+gorithm. These are:
+
+¢ Demonstrative pronouns (this, those)
+¢ Contractions (we’ll, boys’ll)
+
+¢ Modals (should, might, will, can)
+
+¢ Quantifiers (several, many)
+
+¢ Interrogative adjectives (which, whose)
+
+The FO algorithm requires a specification of the number of syllables in each
+word, the location of the stressed syllable within the word, and information con-
+cerning syllable boundaries. This information is found in the PROSOD output file.
+The phonemic information in this file is also used to specify a structure for each
+syllable. This structure is an allowable ordering of voiced or unvoiced obstruents,
+sonorants, and a single vowel.
+
+10.3 Output
+
+There are two possible output files. One file is a stream of fundamental frequency
+values, one value for each 5 msec of the utterance. This file can be merged with
+the output of PHONET (discussed in Chapter 11) which gives values of the 20
+variable parameters each 5 msec. These values are calculated by determining the
+changes in FO during a syllable and using the duration of the segments within the
+syllable to describe a contour with constant slope (absolute value).
+
+A second method, the one currently in use, is to calculate rises and falls on
+each segment (an intermediate stage in the former method) and to use this infor-
+mation to specify FO target values for the midpoint of each segment and for the
+peak point at either the left or right boundary of stressed vowels in content words.
+Unspecified onset values for segments are determined by linear interpolation be-
+tween their midpoint target value and the midpoint target value of the preceding
+segment. This method allows FO values to be calculated every 5 msec using the
+same linear smoothing procedure which is used for some of the other parameters,
+modified slightly by the addition of the possible extra target value as input.
+
+Most peaks are assigned to the right boundary of the stressed vowel in a con-
+tent word. A fall (and possible continuation rise) following the rise which forms
+the peak is then assigned to the midpoint or right boundary of the following seg-
+ment, absorbing any fall or rise that might previously have been assigned to that
+segment. A peak is assigned to the left boundary of a “nuclear-stressed” syllable,
+i.e., the stressed syllable in the final content word of a phrase preceding a silence.
+Preceding unassigned rises or falls are absorbed in the assignment of the peak.
+
+102
diff --git a/pages-txt/115.txt b/pages-txt/115.txt
new file mode 100644
index 0000000..9a9e34c
--- /dev/null
+++ b/pages-txt/115.txt
@@ -0,0 +1,51 @@
+The fundamental frequency generator
+
+10.4 The O’Shaughnessy fundamental frequency algorithm
+
+The algorithm may be considered as a cascade of two separate systems. The first,
+or High Level System, uses syntactic information to sketch the contour. The Low
+Level System uses information generated by the High Level System and additional
+phonemic data to detail the contour.
+
+10.4.1 High Level System
+
+The High Level System predicts a superposed FO contour by taking into considera-
+tion the sentence type, clause contour, phrase contour, and individual word con-
+tour. This contour is further augmented in the Low Level System by considering
+the effect of individual segments.
+
+10.4.2 Sentence type
+
+Two global-level tunes are assigned depending upon sentence type. Tune A is used
+primarily for declaratives. It causes a linear falling FO trend in the clause it is as-
+signed to, and a sharp fall on the last content word in the clause and on those
+words following it. The other tune, Tune B, is used for yes/no questions, that is,
+questions to which an answer of “yes” or “no” is expected. This tune causes a rise
+followed by a relatively flat FO trend and a sharp terminal rise.
+
+10.4.3 Clause contour
+
+The next factor affecting the contour is set by the syntactic boundaries. A sharp
+rise is stipulated at the beginning of a syntactic unit, and a sharp fall at the end. In
+practice, there is only one such contour for each sentence because clauses are not
+identified by the parser. This contour coincides with the tune contour.
+
+10.4.4 Phrase contour
+
+In phrases containing two or more content words, an initial FO rise is assigned
+beginning at the first content word and a final FO fall begins on the last content
+word. If the phrase is nonfinal, a continuation rise is placed on the last syllable of
+the last word.
+
+10.4.5 Word contour
+
+The individual content words within a phrase are given the most FO movement. In
+addition to the sharp rise and fall on the first and last content words in a phrase, a
+rise-fall contour is described on the stressed syllable of each content word. These
+excursions reflect the desire of a speaker to have listeners understand the less pre-
+dictable words in a sentence which are also those words which carry the most in-
+formation. Function words are very common and describe a syntactic structure
+which is easily recognized. Content words, on the other hand, must be emphasized
+somewhat for the utterance to be comprehended, since their occurrence is much
+
+103
diff --git a/pages-txt/116.txt b/pages-txt/116.txt
new file mode 100644
index 0000000..7a13bbe
--- /dev/null
+++ b/pages-txt/116.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+less predictable. The amount of FO movement on each word depends upon its rank
+in the order of parts of speech of content words (see Table 10-1) and also upon the
+number of syllables in the word. Words of higher rank contain larger FO excur-
+sion. Function words and unstressed syllables of content words are given a slight
+(5 Hz) excursion to produce a more natural-sounding contour.
+
+10.4.6 Prosodic indicators
+
+A set of “prosodic indicators” is passed from the High Level System to the Low
+Level System. An accent number gives the relative importance of a word. This
+number ranges from “0” for one-syllable articles to “11+n” for a sentential adverb
+containing n syllables. An integer representing the position of a word in a phrase
+and the importance of that phrase is also assigned. Higher absolute values are
+given to words at boundaries marked by punctuation and to words at the boun-
+daries of large or major phrases. Another value assigned to each word is a number
+indicating the amount of continuation rise. Most words are assigned the value “0”,
+but those words ending a nonfinal phrase are usually given a value which reflects
+the importance of the syntactic boundary which the word immediately precedes. A
+level number applies to words in noun phrases not containing conjunctions. This
+number either signifies that the FO level is to rise, or that the FO level should drop
+on that word. Other words are given level “0”. This indicates a mid-phrase word.
+Additionally, the tune value is defined on each word, and is nonzero on the word
+
+ending a clause. The number of phrases is also a necessary input value to the next
+level.
+
+10.4.7 The Low Level System
+This level reflects the effects of phonemics, lexical stress, and the number of syll-
+ables of the words in the utterance. The number of syllables is used in determining
+the height of the peak on lexically stressed syllables. Although the first and
+highest peak in a sentence is constrained to a maximum of about 190 Hz, longer
+sentences, i.e., sentences with more syllables, begin with higher peaks. This initial
+height allows more freedom of excursion for following peaks. Higher peaks are
+also placed on two lexically stressed syllables if they are separated by unstressed
+syllables, the height of the peaks being dependent upon the number of intervening
+unstressed syllables.
+
+The FO pattern is also affected by the phonemics. For example, unvoiced
+consonants at the beginning of a stressed syllable also cause the contour to fall,
+rather than rise, into the contour of the stressed vowel. (The rise is added to the
+
+peak of the vowel.) See Figure 10-1 for an example of this contour.
+The algorithm first sets the peaks on the lexically stressed syllables. Falls and
+
+104
diff --git a/pages-txt/117.txt b/pages-txt/117.txt
new file mode 100644
index 0000000..bcbea0d
--- /dev/null
+++ b/pages-txt/117.txt
@@ -0,0 +1,58 @@
+The fundamental frequency generator
+
+FO \4 C C \
+stressed unvoiced stressed
+J:k nucleus
+
+)-------
+
+Continuation
+rise
+
+N
+
+......
+..............
+cos
+
+n ow e Y XL Y X X T T X N Wy
+
+.................... Declination
+
+........
+
+""""
+e
+.............
+sed,
+va e
+
+Glottal stop
+
+Time
+
+Figure 10-1: Example of FO contours
+
+rises are then assigned around these peaks. Continuation rises are added to the last
+syllable of most non-sentence-final phrases (Figure 10-1), and sentence-final
+words are given rises or falls depending upon their tune. Finally, the FO contour is
+completed by specifying the amount of fall on other nonstressed syllables.
+
+The peak on a stressed syllable is proportional to the accent number, but is
+also decreased through the sentence. The peaks are arrayed along a falling
+declination line so that peaks of equal height have lower values moving from peaks
+which are sentence-initial to those which are sentence-final. The rate of declina-
+tion is steeper for sentences with Tune A, and less steep for sentences with Tune B.
+
+Each content word is given a rise and fall around the peak of its primary-
+stressed syllable. The basic rise is 40 percent of the distance from the initial value
+of the lower declination line (110 Hz for Tune A, 125 Hz for Tune B) to the peak
+value. This basic value is altered for peaks in boundary position. More rise, and
+thus a lower valley, is assigned to a phrase-initial peak and less rise (i.e. a higher
+valley) to phrase-final peaks. In addition, intervening unaccented syllables require
+more rise on the peaks surrounding them. The basic fall value is 20 percent of the
+distance from the lower declination line to the peak value. This value is increased
+for a phrase-final fall. Rises and falls within a phrase are further reduced (by 30
+percent).
+
+105
diff --git a/pages-txt/118.txt b/pages-txt/118.txt
new file mode 100644
index 0000000..7b5d443
--- /dev/null
+++ b/pages-txt/118.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+The normal pattern is considered to be one of alternating accented and un-
+
+accented syllables. An accented syllable is a stressed syllable of a content word;
+unaccented syllables are all others. If two accented syllables are adjacent, their
+rise values are reduced by 40 percent. Two accented syllables separated by two,
+three, or four unaccented syllables have their rise values increased by 15 percent,
+20 percent, and 30 percent, respectively. Additional unaccented syllables cause no
+further effect. The peak height on an accented syllable preceded by two or three
+unaccented syllables is decreased by 15 percent and 25 percent, respectively.
+However, an accented syllable followed by two or three unaccented syllables is in-
+creased by 10 percent and 15 percent, respectively. If three accented syllables ap-
+pear in succession, the fundamental frequency of the second is allowed to fall from
+the peak of the first, and rise into the peak of the third, i.e., its fall and rise are
+interchanged in time. A word not covered by a node, and preceded by three or
+more unaccented syllables, is assigned a rise value equal to the difference between
+its peak value and 95 Hz.
+
+Words in terminal positions are given special rise and fall values. In a state-
+ment (Tune A), the last syllable is given a fall value such that FO reaches 75 Hz.
+In a yes/no question (Tune B), a rise is assigned after the last accented syllable’s
+fall (none if it is the last syllable), which gives a final FO value 20 percent higher
+than any previous peak.
+
+The highest continuation rise (16 Hz) is assigned to the last syllable of a
+word, if it is followed by a nonterminal punctuation mark or a conjunction, and if
+there has been no punctuation or conjunction since the last content word. A con-
+tinuation rise of 8 Hz is assigned to the last syllable of the last word in a nonfinal
+phrase, if there have been more than five words since the last word to which a con-
+tinuation rise was assigned.
+
+If two accented syllables are separated by unaccented syllables, the FO con-
+tour connecting them is either straight or falling. If the difference between the
+endpoints of the two accented syllables is positive, the previous fall and next rise
+are adjusted by the same amount (half the difference), so that the FO contour does
+not change on intermediate unaccented syllables.
+
+In the case in which the difference in endpoints is negative, that fall is spread
+over the intermediate unaccented syllables in two ways. If the unaccented syll-
+ables occur within a phrase, the falling rate is linear. Each successive unaccented
+syllable gets an equal share of the fall. For unaccented syllables which are not in
+the same phrase, a more exponential falling pattern is assigned with the earlier un-
+accented syllables receiving more of the fall. Unaccented syllables terminating ei-
+ther a Tune A or Tune B clause, fall or rise in equal amounts to the final value.
+
+106
diff --git a/pages-txt/119.txt b/pages-txt/119.txt
new file mode 100644
index 0000000..bf26e95
--- /dev/null
+++ b/pages-txt/119.txt
@@ -0,0 +1,33 @@
+The fundamental frequency generator
+
+10.5 Adjustments to the O’Shaughnessy algorithm
+
+Several additions and adjustments have been made to the original algorithm. A
+third tune has been stipulated for wh-questions, that is, questions which include a
+question word such as “how” or “who” and to which an answer other than “yes” or
+“no” is expected. It will produce a high peak on the question word, a steeper fall-
+ing FO than in declaratives, and a higher peak on the last accented syllable than is
+produced for declaratives.
+
+The additional 20 percent rise assigned to an initial unvoiced consonant is
+added to the left boundary of the following vowel rather than to its peak, so that
+the contour falls from the initial portion of the vowel if the peak is at the left
+boundary. Another adjustment is the insertion of a local 5 Hz perturbation on both
+flat and falling unstressed syllables about their midpoint. A third adjustment is the
+addition of a dip in the contour at points of glottalization (see Figure 10-1). In ad-
+dition, the final FO value in statements has been lowered by 10 Hz, and the range
+has been narrowed so that an excursion above 190 Hz is rare.
+
+10.6 Potential improvements from additional syntactic information .
+
+A number of additional provisions in the O’Shaughnessy algorithm could be used
+if a more complete parser were available. Identification of dependent and inde-
+pendent clauses, and of matrix and embedded clauses would provide more infor-
+mation for resetting declination lines and calculating continuation rises. Boun-
+daries created by a number of syntactic transformations are also considered in the
+algorithm. These are clefting, there-insertion, preposing, topicalization, left and
+right dislocation, extraposition, and ellipsis. Special contours are provided for
+both appositives and vocatives as well as tag questions and yes/no questions offer-
+ing alternatives.
+
+107
diff --git a/pages-txt/120.txt b/pages-txt/120.txt
new file mode 100644
index 0000000..61bfc41
--- /dev/null
+++ b/pages-txt/120.txt
@@ -0,0 +1,42 @@
+11
+
+The phonetic component
+
+11.1 Overview
+
+The phonetic component, PHONET, accepts input from the fundamental frequency
+component FOTARG (in the form of an array of phonetic segment names, and a
+segmental stress feature, segmental duration, and two fundamental frequency tar-
+gets for each phone), and produces output values for 20 synthesizer control
+parameters every 5 msec. This chapter concemns the strategy for phonetic-to-
+parametric rule development and a summary of the form and content of individual
+rules for control parameter specification.
+
+11.1.1 “Stored prosodics” synthesis
+
+The phonetic component PHONET and synthesizer components can be operated in
+stand-alone mode in which the phonetic segment string, durations, and fundamen-
+tal frequency contour specification that form the input to PHONET are hand-tuned
+to be as accurate as possible. For example, one might record a natural version of a
+sentence, extract fundamental frequency, measure segmental durations, select
+phonetic segments according to the pronunciation used by the real speaker, and
+format this information in a way that is compatible with PHONET input. The ad-
+vantage of this approach is the naturalness of the speech that can be produced with
+an input representation consisting of about 250 bits per second of speech.
+
+This method of generating speech might be compared with the Texas
+Instruments’ Speak-’N-Spell vocoder synthesizer. We suspect that the overall in-
+telligibility and naturalness of the MITalk “stored-prosodics™ synthesis is slightly
+better at 250 bits/second than Speak-’N-Spell at 1200 bits/second. However, the
+significant disadvantage of MITalk is that there is no automatic procedure for
+determination of input parameter data for PHONET, whereas Speak-’N-Spell syn-
+thesis can be prepared automatically from a linear-prediction vocoder analyzer
+with only minimal selection and hand tuning.
+
+11.1.2 Structure of PHONET
+
+The phonetic component includes a large array of target values for various control
+parameters for each of about 60 phonetic segment types. Smoothing between tar-
+get values depends on time constants computed by rule, as well as depending on
+
+108
diff --git a/pages-txt/121.txt b/pages-txt/121.txt
new file mode 100644
index 0000000..d949737
--- /dev/null
+++ b/pages-txt/121.txt
@@ -0,0 +1,48 @@
+The phonetic component
+
+the parameter value assigned to the time of the segment boundary. These con-
+stants are determined by rules that involve features of the current phonetic segment
+PHOCUR, the previous phonetic segment PHOLAS, and the next phonetic seg-
+ment PHONEX. In some cases, the rules have to examine features of segments
+further from the current segment, but this is rare. For example, in pin, the time of
+voicing onset in the vowel preceded by the voiceless plosive pp is delayed by
+about 50 msec, unless the segment preceding the voiceless plosive is an ss, as in
+spin. The variable control parameters are listed later in Table 11-3.
+
+11.1.3 History of formant synthesis-by-rule
+
+As originally demonstrated by John Holmes, successful imitation of a natural ut-
+terance depends primarily on matching observed short-term spectra. This tech-
+nique succeeds, in part, because it reproduces all of the potential cues present in
+the spectrum, even though we may not know which cues are most important. The
+speech perception apparatus appears to be aware of any and all (perceptually
+discriminable) regularities present in the acoustic signal generated by the speech
+production apparatus, and these regularities should be included in synthetic stimuli
+if possible.
+
+There have been a number of previous efforts to specify general strategies for
+formant synthesis-by-rule (see, e.g., Holmes et al, 1964; Mattingly, 1968a;
+Rabiner, 1968a; Coker et al., 1973; Klatt, 1972, 1976a). However, examination of
+these publications suggests that consonant-vowel intelligibility is nowhere near as
+high as in listening to natural speech. For example, Rabiner (1968a) estimated that
+consonants in his synthetic consonant-vowel nonsense stimuli were 85 percent in-
+telligible to phonetically trained listeners, but that natural tokens of the same syll-
+ables were about 99 percent intelligible. Other rule programs, apparently, perform
+no better, although relevant evaluative data are generally not available.
+
+Why isn’t intelligibility higher? Each rule system attempts to make ap-
+propriate generalizations and simplifications concerning the form and content of
+rules for consonant-vowel synthesis. Have the wrong generalizations been made?
+The results described below in Section 11.2 suggest that this conjecture is true.
+
+11.2 “Synthesis-by-analysis” of consonant-vowel syllables
+
+11.2.1 Analysis of CV syllables
+
+The data base that was recorded and analyzed in order to develop new consonant-
+vowel synthesis rules consists of speech samples obtained from six talkers who
+were native to a single midwestern dialect region -- three males and three females
+(Klatt, 1979b). The intent was to use the data from all six talkers to establish the
+form of the synthesis rules, but the actual parameter values inserted in the rules
+
+109
diff --git a/pages-txt/122.txt b/pages-txt/122.txt
new file mode 100644
index 0000000..60db778
--- /dev/null
+++ b/pages-txt/122.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+came from a more extensive analysis of the speech of one of the male subjects
+(since data averaged across several male and female talkers would probably not
+make for a very good synthetic talker). Subjects read a list of 336 different CVC
+nonsense syllables once, except for the designated talker (DHK) who read the list
+twice on three separate occasions.
+
+The kind of analysis that was performed on the data base is illustrated in
+Figure 11-1. The speech was low-pass filtered at 4.9 kHz and digitized at 10k
+samples per second. Linear prediction spectra were computed at a number of
+(hand-selected) locations in a syllable. The waveform segment, such as the one
+shown at the top in Figure 11-1, was first differenced (to attenuate very low fre-
+quency background noise) and multiplied by a Kaiser window (Beta=7.0) prior to
+11-pole linear prediction analysis. The linear prediction spectrum is shown at the
+bottom of the figure along with the discrete Fourier transform. The 25.6 msec
+time-weighting window has an effective averaging duration of about 10 msec. The
+same window was used at all analysis points, except during the sustained frication
+noise of fricatives, where the window duration was increased so as to better es-
+timate the spectral characteristics of the noise.
+
+Spectral samples were obtained: 1) during the consonantal steady state (or at
+burst onset for a plosive), 2) at voicing onset (or early in the consonant-vowel tran-
+sition for voiced consonants), and 3) shortly after the end of the consonant-vowel
+transition. Formant frequencies were also estimated by locating the peaks in a
+linear prediction spectrum. Formant motions were plotted every 10 msec during
+voiced portions of syllables. Intensity and fundamental frequency were also es-
+timated and plotted as a function of time.
+
+In this chapter, it is only possible to present some of the highlights of the
+analyses. For example, Figure 11-2 presents first and second formant frequency
+trajectories of sixteen vowel nuclei, as averaged across all consonantal environ-
+ments for the designated talker. Most of the vowels appear to be diphthongized to
+some extent. (The true diphthongs are shown with dashed vectors.) In particular,
+it is a characteristic of this common midwestern dialect to terminate the short
+vowels IH, EH, AE, and UH in a schwa-like offglide. These average data for
+vowels are used as a starting point for consonant-vowel synthesis.
+
+Analysis of consonants revealed two major conclusions concerning the form
+of rules appropriate for synthesis of a consonant before any vowel:
+
+1. Some consonants, particularly obstruents, take on significantly dif-
+ferent characteristics depending on whether the following vowel is a
+front vowel, a back unrounded vowel, or a back rounded vowel.
+
+110
diff --git a/pages-txt/123.txt b/pages-txt/123.txt
new file mode 100644
index 0000000..36ac61c
--- /dev/null
+++ b/pages-txt/123.txt
@@ -0,0 +1,35 @@
+The phonetic component
+
+t=240ms.
+
+L t=180ms. t=280ms. J
+
+Amplitude (dB)
+8 5 S 3 3
+
+n
+o
+
+-
+o
+
+0 1 2 3 4 5
+Frequency (kHz)
+
+Figure 11-1: Spectrum analysis of a speech waveform
+
+2. Within each set of vowels, spectra associated with each consonant
+are surprisingly invariant, and formant transitions into the vowel
+
+obey a modified “locus” equation.
+
+These conclusions are illustrated in the next two figures. Figure 11-3 displays
+average spectra of plosive bursts before each vowel. Each curve is the average of
+six tokens of a syllable obtained from a single talker. Burst spectra are similar be-
+fore vowels in a particular category, even though these (linear prediction) spectra
+have not been normalized for speaking level. Burst spectra for GG differ strikingly
+across vowel sets, and burst spectra for DD are modified before rounded vowels.
+
+Similar changes are seen for voiceless plosives and for fricatives.
+
+111
diff --git a/pages-txt/124.txt b/pages-txt/124.txt
new file mode 100644
index 0000000..f338ef8
--- /dev/null
+++ b/pages-txt/124.txt
@@ -0,0 +1,43 @@
+From text to speech: The MITalk system
+
+800 Back «<—> Front
+700
+)
+< 600
+-
+(o
+e
+5
+e 500
+S
+2
+ic
+400
+300
+
+800 1000 1200 1400 1600 1800 2000 2200
+Second formant frequency
+
+Figure 11-2: First and second formant motions in English vowels
+
+Figure 11-4 contains a plot of average frequencies for the lowest three for-
+mants, as measured at voicing onset following plosive release in syllables contain-
+ing BB, DD, and GG. Values before 16 vowel nuclei are plotted on the vertical axis
+as a function of the formant frequency seen in the early part of the vowel. Notice
+that, for the first formant, data can be well approximated by a straight line.
+However, the value of F1 at voicing onset, i.e., when F1 is first perceptible, is as
+high as 500 Hz before low vowels because most of the rapid rise in F1 at release
+takes place prior to voicing onset.
+
+If formant values at voicing onset fall on a straight line in this kind of plot, a
+“locus theory” description of the data is possible; that is, there exists a locus theory
+equation with two free parameters that can predict F1 at voicing onset from a
+knowledge of the vowel target. An example of this synthesis procedure will be
+presented shortly.
+
+Before describing aspects of the synthesis strategy, it should be noted that
+there exist articulatory motivations for dividing vowels into these three sets. An-
+ticipatory coarticulation of the vowel features “front-back” and “rounded-
+unrounded” can explain all of the acoustic observations noted in the figures.
+
+112
diff --git a/pages-txt/125.txt b/pages-txt/125.txt
new file mode 100644
index 0000000..4928dfc
--- /dev/null
+++ b/pages-txt/125.txt
@@ -0,0 +1,48 @@
+The phonetic component
+
+60r
+BEFORE FRONT VOWELS BEFORE FRONT VOWELS BEFORE FRONT VOWELS
+40 [ f%
+
+(dﬂ)
+
+(dB)
+o B
+(dB)
+
+R
+60
+BEFORE BACK UNROUNDED VOWELS
+40
+&
+3
+zoﬁ\/i%i
+%12 3 & %
+
+0 I 2 4
+-FREQ (kHz) FREQ (kH2) FREQ (k?-(z) s
+
+(e8] [oo] )
+
+Figure 11-3: Linear prediction of plosive bursts before vowels
+
+11.2.2 Synthesis strategy
+
+A possible synthesis strategy for predicting formant frequency motions in CV syll-
+ables is illustrated in Figure 11-5 for the syllable go. The vowel is first defined in
+terms of straight-line segments. Then the formant values associated with the con-
+sonant and the consonant-vowel transition are imposed, again using straight-line
+segments and a locus theory equation to determine formant values at the CV
+boundary. Formant motions associated with the vowel are defined in the upper
+panel. Perturbations imposed by the initial consonant are indicated in the lower
+panel.
+
+Many of the remaining synthesis parameters, such as formant bandwidths and
+formant amplitudes in frication spectra, must be determined by trial-and-error
+comparisons of synthetic and natural linear prediction spectra. These adjustments
+took several iterations. Experience showed that small errors in frication spectra, or
+frication level, or the time course of intensity buildup, would result in an increased
+identification error rate, so that the tedious trial-and-error optimization of these
+parameters is important. The resulting control parameter values are tied to a par-
+
+113
diff --git a/pages-txt/126.txt b/pages-txt/126.txt
new file mode 100644
index 0000000..e68af70
--- /dev/null
+++ b/pages-txt/126.txt
@@ -0,0 +1,60 @@
+From text to speech: The MITalk system
+
+N
+I
+X 6 6
+3
+& 4 > 4
+o
+=
+o
+o 2 2
+©
+= 2 4 6 8 2 4 6 8 2 4 6 8
+F1 target of vowel (kHz) F1 target of vowel (kHz) F1 target of vowel (kHz)
+= 20 2.0 2.0
+g "
+E 1.6 / 1.6 / 1.6 *
+g Xr{ /
+2 1.2 1.2 1.2
+=
+3]
+© /
+>
+= 8 8 8
+N
+w
+8 12 16 20 8 12 16 20 8 12 16 20
+F2 target of vowel (kHz) F2 target of vowel (kHz) F2 target of vowel (kHz)
+= 3.0 3.0
+I
+<
+% 25 2.5 =
+8 ‘_____._.o-‘;—"
+o
+220 2.0
+L
+O
+>
+= 15 1.5
+™
+.
+15 20 25 8.0 15 20 25 3.0 15 20 25 3.0
+F3 target of vowel (kHz) F3 target of vowel (kHz) F3 target of vowel (kHz)
+
+Figure 11-4: Frequency of the lowest three formants measured at voicing onset
+for syllables involving BB, DD, and GG
+
+ticular software synthesizer (Klatt, 1980; see Chapter 12), but perhaps future publi-
+cation of the numbers would be of some value to those who wish to implement the
+synthesizer program.
+
+11.2.3 Intelligibility evaluation
+
+The intelligibility of CV syllables produced by the rules was evaluated by syn-
+thesizing 336 different CVt syllables in a random order. The tape was played to
+five phonetically trained listeners who transcribed both the consonants and the
+vowels. The vowel identification rate was 99 percent and the consonant identifica-
+tion rate was 95 percent. While these results are encouraging, we continue to seek
+
+114
diff --git a/pages-txt/127.txt b/pages-txt/127.txt
new file mode 100644
index 0000000..691b329
--- /dev/null
+++ b/pages-txt/127.txt
@@ -0,0 +1,45 @@
+The phonetic component
+
+Second formant transition for  ow
+
+1100
+850
+
+Frequency (kHz)
+
+o R e
+[0 T
+
+0 200 230 380 460 800
+Time (ms)
+Second formant transition for GG ow
+2 =
+
+1448=1700+0.42(F2 -1700)
+
+850
+
+Frequency (kHz)
+
+o 3 ]
+0 200 280 380 460 800
+
+Time (ms)
+
+Figure 11-5: Synthesis strategy for a CV syllable
+
+ways to improve the consonants and plan to extend this work to clusters and to
+postvocalic allophones.
+
+How can the intelligibility of the consonant-vowel syllables be further im-
+proved? Of particular interest are two questions: 1) will the generalizations con-
+cerning vowel categories and the form of the locus equations have to be modified,
+or is it only necessary to modify the numbers that go into the equations, and 2) is a
+spectral-matching procedure' of the type outlined above sufficient for the purpose
+of intelligibility optimization? The generalizations that have been made will prob-
+ably not have to be modified or abandoned. Each time that the perceptual data
+have indicated a problem with a particular CV syllable or class of syllables,
+reinspection of the spectral match between the synthesis and the average spectra
+for talker DHK has revealed fairly substantial differences, and correction of the
+
+115
diff --git a/pages-txt/128.txt b/pages-txt/128.txt
new file mode 100644
index 0000000..dfdf8da
--- /dev/null
+++ b/pages-txt/128.txt
@@ -0,0 +1,51 @@
+From text to speech: The MITalk system
+
+differences has led to clear improvement in intelligibility. At least one more itera-
+tion of this procedure is needed. Furthermore, within the constraints imposed by
+the synthesizer itself, matching of linear-prediction spectra is adequate to the task.
+
+11.3 General rules for the synthesis of phonetic sequences
+
+The rule program used in MITalk differs from the limited CV synthesis algorithm
+described above. The MITalk phonetic component PHONET is patterned after a
+Fortran-based synthesis-by-rule program described by Klatt (1976a). Since that
+time, both the program structure and the constants contained in target tables for
+each phone have been modified. These modifications were made in order to incor-
+porate some of the new consonant-vowel synthesis rules described in the previous
+section, and to simplify the rule structure.
+
+The general procedure for drawing control parameter values is:
+
+1. Draw the target value for the first segment.
+
+2. Draw the target value for the next segment.
+
+3. Smooth the boundary between the segments using one of the
+templates shown in Figure 11-6 (note that DISCON does no
+
+smoothing).
+
+4. Go to step 2 unless there are no more segments.
+The transition between target values for each control parameter may either be dis-
+continuous or smooth. The boundary value and transition duration in each direc-
+tion from the logical phoneme boundary are computed by rules that take into ac-
+count manner features of the segments involved.
+
+11.3.1 Vowels
+
+The control parameters that are usually varied to generate an isolated vowel are the
+amplitude of voicing AV; the fundamental frequency of vocal fold vibrations FO;
+the lowest three formant frequencies F1, F2, and F3; and bandwidths B1, B2, and
+B3. The fourth and fifth formant frequencies might be varied to simulate spectral
+details, but this is not essential for high intelligibility. To create a natural breathy
+vowel termination, the amplitude of aspiration AH and the amplitude of quasi-
+sinusoidal voicing AVS are activated.
+
+Table 11-1 includes suggested target values for variable control parameters
+that are used to differentiate among English vowels. Formant frequency and
+bandwidth targets were obtained by trial-and-error spectral matching to a large set
+of CV syllables spoken by talker DHK. Bandwidth values are often larger than
+closed-glottis values obtained by Fujimura and Lindqvist (1971), because the
+bandwidths of Table 11-1 have been adjusted to take into account changes to ob-
+
+116
diff --git a/pages-txt/129.txt b/pages-txt/129.txt
new file mode 100644
index 0000000..3f1c9c2
--- /dev/null
+++ b/pages-txt/129.txt
@@ -0,0 +1,19 @@
+The phonetic component
+
+a) Transition type: DISCON
+
+Value
+
+b) Transition type: SETSMO
+P Tep : Ter ‘ .
+BV,
+
+Value
+
+Value
+
+Value
+
+Figure 11-6: Templates for smoothing adjacent phonetic segment targets
+
+117
diff --git a/pages-txt/130.txt b/pages-txt/130.txt
new file mode 100644
index 0000000..d463529
--- /dev/null
+++ b/pages-txt/130.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+served formant amplitudes caused by factors such as glottal losses and ir-
+regularities in the voicing source spectrum. Where two values are given, the vowel
+is diphthongized or has a schwa-like offglide in the speech of talker DHK. Dura-
+tions of steady states and transition portions of diphthongized vowels depend on
+total vowel duration, and are different for each vowel.
+
+The mechanism for synthesizing a diphthongized vowel is shown in Figure
+11-7. Each of the constants shown in the figure is stored in tables for all diphthon-
+gized vowels, including those having schwa offglides.
+
+11.3.2 Consonants
+
+Additional control parameters must be varied for the synthesis of various classes of
+consonants. Table 11-2 includes target values for variable control parameters that
+are used to synthesize portions of English consonants (frication spectra of frica-
+tives, burst spectra of plosives, nasal murmurs for nasals, and steady portions of
+
+sonorants).
+
+The sonorant consonants Ww, YY, RR, and LL are similar to vowels and re-
+quire the same set of control parameters to be varied in order to differentiate
+among them. Formant values given in Table 11-2 for the prevocalic sonorants RR
+and LL depend somewhat on the following vowel. The source amplitude, AV, for
+a prevocalic sonorant should be about 10 dB less than in the vowel. The sonorant
+HH can be synthesized by taking formant frequency and bandwidth parameters
+from the following vowel, increasing the first formant bandwidth to about 300 Hz,
+and replacing voicing by aspiration.
+
+The fricatives characterized in Table 11-2 include both voiceless fricatives
+(AF=60, AV=0, AVS=0) and voiced fricatives (AF=50, AV=47, AVS=47). For-
+mants to be excited by the frication noise source are determined by the amplitude
+controls A2, A3, A4, A5, A6, and AB. The amplitude of the parallel second for-
+mant, A2, is zero for all of these consonants before front vowels, but the second
+formant is a front cavity resonance for velars before nonfront vowels and A2 is
+excited. The values given for F2 and F3 are not only valid during the fricative, but
+also can serve as “loci” for the characterization of the consonant-vowel formant
+transitions before front vowels. These are virtual loci in that formant frequency
+values observed at the onset of glottal excitation are somewhere between the locus
+and the vowel target frequency -- the amount of virtual transition being dependent
+on formant-cavity affiliations.
+
+The specification of frication spectra in the table is accurate only before front
+vowels in the speech of talker DHK. Before back and rounded vowels, systematic
+changes are observed to the fricative spectra because of anticipatory coarticulation.
+
+118
diff --git a/pages-txt/131.txt b/pages-txt/131.txt
new file mode 100644
index 0000000..4eccfbd
--- /dev/null
+++ b/pages-txt/131.txt
@@ -0,0 +1,35 @@
+The phonetic component
+
+Table 11-1: Parameter values for the synthesis of selected vowels
+
+Vowel F1 F2 F3 Bl B2 B3
+1Y 310 2020 2960 45 200 400
+290 2070 2960 60 200 400
+IH 400 1800 2570 50 100 140
+470 1600 2600 50 100 140
+EY 480 1720 2520 70 100 200
+330 2020 2600 55 100 200
+EH 530 1680 2500 60 90 200
+620 1530 2530 60 90 200
+AE 620 1660 2430 70 150 320
+650 1490 2470 70 100 320
+AA 700 1220 2600 130 70 160
+A0 600 990 2570 90 100 80
+630 1040 2600 90 100 80
+AH 620 1220 2550 80 50 140
+oW 540 1100 2300 80 70 70
+450 900 2300 80 70 70
+UH 450 1100 2350 80 100 80
+500 1180 2390 80 100 80
+UW 350 1250 2200 65 110 140
+320 900 2200 65 110 140
+ER 470 1270 1540 100 60 110
+420 1310 1540 100 60 110
+AY 660 1200 2550 100 70 200
+400 1880 2500 70 100 200
+AW 640 1230 2550 80 70 140
+420 940 2350 80 70 80
+oY 550 960 2400 80 50 130
+360 1820 2450 60 50 160
+
+119
diff --git a/pages-txt/132.txt b/pages-txt/132.txt
new file mode 100644
index 0000000..21b9680
--- /dev/null
+++ b/pages-txt/132.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+VTAR3: : DTAR3
+
+Frequency
+
+| DTART
+
+TCDIPH  TCDIPH
+
+0 TDMID INHDUR
+Time
+
+Figure 11-7: Constants used to specify the inherent formant and durational
+characteristics of a sonorant
+
+In addition to differences in source amplitudes, voiced and voiceless fricatives dif-
+fer in that F1 is higher and B1 is larger when the glottis is open.
+
+The affricate parameters in Table 11-2 refer to the fricative portion of the af-
+fricate. Similarly, the plosive parameters in Table 11-2 refer to the brief burst of
+frication noise generated at plosive release. Formant frequency values again serve
+as loci for predicting formant positions at voicing onset.
+
+The parameters that are used to generate a nasal murmur include the nasal
+pole and zero frequencies FNP and FNZ. The nasal pole and zero are used
+primarily to approximate vowel nasalization at nasal release by splitting F1 into a
+pole-zero-pole complex. The details of nasal murmurs that have been described by
+Fujimura (1962) are approximated by formant bandwidth adjustments rather than
+by the theoretically correct method of pole-zero insertion. The reason is that it is
+not possible to simulate both the higher frequency pole-zero details of nasal mur-
+murs and vowel nasalization simultaneously without moving the frequency of the
+nasal pole and zero very fast at release, which would generate an objectionable
+click in the output, and vowel nasalization has been found to be perceptually more
+important. A nasalized vowel is generated by increasing F1 by about 100 Hz, and
+by setting the frequency of the nasal zero to be the average of this new F1 value
+and 270 Hz (the frequency of the fixed nasal pole).
+
+Not included in Tables 11-1 and 11-2 are steady-state target values for un-
+stressed allophones, postvocalic allophones, flaps, glottal stops, voicebars, and
+
+120
diff --git a/pages-txt/133.txt b/pages-txt/133.txt
new file mode 100644
index 0000000..f82398e
--- /dev/null
+++ b/pages-txt/133.txt
@@ -0,0 +1,39 @@
+The phonetic component
+
+Table 11-2: Parameter values for the synthesis of selected components of English
+consonants before front vowels
+
+Sonorant
+
+F1 F2 F3 Bl B2 B3
+we 290 610 2150 50 80 60
+yy 260 2070 3020 40 250 500
+RR 310 1060 1380 70 100 120
+LL 310 1050 2880 50 100 280
+Fricative
+
+F1 F2 F3 Bl B2 B3 A2 A3 A4 A5 A6 AB
+FF 340 1100 2080 200 120 150 O 0 0 0 0 57
+vv 220 1100 2080 60 90 120 O 0 0 0 0 57
+TH 320 1290 2540 200 90 200 O 0 0 0 28 48
+pH 270 1290 2540 60 8 170 O 0 0 0 28 48
+ss 320 1390 2530 200 80 200 O 0 0 0O 52 O
+zz 240 1390 2530 70 60 180 O 0 0 0O 52 O
+sa 300 1840 2750 200 100 300 57 48 48 46 O 0
+Affricate
+ce 350 1800 2820 200 90 300 O 4 60 53 53 O
+Ji 260 1800 2820 60 8 270 O 4 60 53 53 O
+Plosive
+pp 400 1100 2150 300 150 220 O 0 0 0 0 63
+BB 200 1100 2150 60 110 130 O 0 0 0 0O 63
+T 400 1600 2600 300 120 250 O 30 45 57 63 O
+pp 200 1600 2600 60 100 170 O 47 60 62 60 O
+KKk 300 1990 2850 250 160 330 O 53 43 45 45 O
+Gec 200 1990 2850 60 150 280 O 53 43 45 45 O
+Nasal
+
+FNP FNZ F1 F2 F3 Bl B2 B3
+MM 270 450 480 1270 2130 40 200 200
+NN 270 450 480 1340 2470 40 300 300
+
+121
diff --git a/pages-txt/134.txt b/pages-txt/134.txt
new file mode 100644
index 0000000..d569602
--- /dev/null
+++ b/pages-txt/134.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+consonant clusters. Characterization of even the static properties of these phonetic
+segments is beyond the scope of the present chapter.
+
+11.3.3 Structure of the output parameter file
+The output file consists of one complete set of control parameter values per 5 msec
+
+of speech. The control parameters that are varied are identified in Table 11-3.
+
+Table 11-3: Variable control parameters specified in PHONET
+
+N  Symbol Name
+1 AV  amplitude of voicing in dB
+2 AF  amplitude of frication in dB
+3 AH amplitude of aspiration in dB
+4 AVS amplitude of sinusoidal voicing in dB
+5 FO voicing fundamental frequency in Hz
+6 F1 first formant frequency in Hz
+7 F2  second formant frequency in Hz
+8 F3  third formant frequency in Hz
+9 F4  fourth formant frequency in Hz
+10 FNZ nasal zero frequency in Hz
+11 Bl  first formant bandwidth in Hz
+12 B2  second formant bandwidth in Hz
+13 B3  third formant bandwidth in Hz
+14 A2  second paralle] formant amplitude in dB
+15 A3 third parallel formant amplitude in dB
+16 A4 fourth parallel formant amplitude in dB
+17 AS fifth parallel formant amplitude in dB
+18 A6  sixth parallel formant amplitude in dB
+19 AB bypass path amplitude in dB
+20 not currently used
+11.4 Summary
+
+PHONET differs from a number of other formant-based synthesis-by-rule
+programs (e.g. Votrax, Kurzweil, Holmes, Mattingly, Rabiner, or Hertz) primarily
+in terms of the total number of contextg-dependent rules that have been formulated
+in order to model details of the spectra of phonetic transitions. A complete
+description of these rules is given in Appendix C.
+
+122
diff --git a/pages-txt/135.txt b/pages-txt/135.txt
new file mode 100644
index 0000000..6d01638
--- /dev/null
+++ b/pages-txt/135.txt
@@ -0,0 +1,56 @@
+12
+
+The Klatt formant synthesizer
+
+12.1 Overview
+
+The final two modules of the MITalk system (CWTRAN and COEWAYV) simulate
+a formant synthesizer (Klatt, 1980). Figure 12-1 shows the interface between these
+modules and the hardware which produces the actual speech. Synthesizer control
+parameter data are specified every 5 msec by rules contained in the phonetic com-
+ponent described in Chapter 11. There are 39 control parameters that specify the
+actions of the software synthesizer, of which only 20 are varied as a function of
+
+time.
+- LOUDSPEAKER
+
+4.9 kHz
+LOW-PASS
+FILTER
+
+CWTRAN
+COEWAV
+
+D/A
+CONVERTER
+
+A A
+! !
+| i
+1
+
+CONTROL WAVEFORM
+
+PARAMETERS  SAMPLES
+(1 SET/5 ms) (10,000/s
+12 BITS/SAMPLE)
+
+Figure 12-1: Interface between synthesizer software and hardware
+
+The Klatt formant synthesizer consists of two logically distinct modules,
+CWTRAN and COEWAYV. The first modcle, CWTRAN, accepts control
+parameter data such as formant freqhéncies, formant bandwidths, and fundamental
+frequency (all specified in Hz), as well as source amplitudes and amplitudes of
+each parallel formant (specified in dB) and derives a set of difference-equation
+coefficients for each digital formant resonator and a set of linear amplitude coef-
+ficients. The second synthesizer subroutine, COEWAYV, accepts as input this coef-
+ficient and amplitude array for a 5 msec frame and computes the next 5 msec
+chunk of waveform.
+
+Considerable system speed improvement can be obtained by implementing
+the final module, COEWAYV, as a hardware digital filter. A TTL implementation
+has been constructed (Miranker, 1978). Whether the software subroutine or the
+VTM is used, output waveforrn samples are played through a digital-to-analog
+converter, analog low-pass filter, and loudspeaker.
+
+123
diff --git a/pages-txt/136.txt b/pages-txt/136.txt
new file mode 100644
index 0000000..9a07f17
--- /dev/null
+++ b/pages-txt/136.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+12.1.1 Software simulation vs. hardware construction
+The advantages of a software implementation over the construction of special-
+purpose analog hardware are substantial. The synthesizer does not need repeated
+
+' calibration, it is stable, and the signal-to-noise ratio (quantization noise in the case
+
+of a digital simulation) can be made as large as desired. The configuration can
+easily be changed as new ideas are proposed. For example, the voices of women
+and children can be synthesized with appropriate modifications to the voicing
+source and cascade vocal tract configuration. Graphic terminals are usually avail-
+able in a computer facility and can be programmed to view control parameter data
+or selected portions of the output speech waveform. Short-time spectra can also be
+computed and displayed in order to make detailed spectral comparisons between
+natural and synthetic waveforms.
+
+12.1.2 Formant synthesis vs. articulatory synthesis
+
+Speech synthesizers fall into two broad categories: 1) articulatory synthesizers that
+attempt to model faithfully the mechanical motions of the articulators, and the
+resulting distributions of volume velocity and sound pressure in the lungs, larynx,
+and vocal and nasal tracts (Flanagan et al., 1975), and 2) formant synthesizers
+which derive an approximation to a speech waveform by a simpler set of rules for-
+mulated in the acoustic domain. The present chapter is concerned only with for-
+mant models of speech generation since current articulatory models require several
+orders of magnitude more computation, and the resultant speech output cannot be
+specified with sufficient precision for direct optimization of the rules by trial-and-
+error comparisons with natural speech.
+
+The synthesizer design is based on an acoustic theory of speech production
+presented in Fant (1960), and is summarized in Figure 12-2. According to this
+view, one or more sources of sound energy are activated by the build-up of lung
+pressure. Treating each sound source separately, we may characterize it in the fre-
+quency domain by a source spectrum S(f), where f is frequency in Hz. Each
+sound source excites the vocal tract which acts as a resonating system analogous to
+an organ pipe.
+
+Since the vocal tract is a linear system, it can be characterized in the fre-
+quency domain by a linear transfer function 7'(f), which is the ratio of lip-plus-
+nose volume velocity U(f) to source input S(f). Finally, the spectrum of the
+sound pressure that would be recorded some distance from the lips of the talker
+P () is related to lip-plus-nose volume velocity U(f) by a radiation characteristic
+R (f) that describes the effects of directional sound propagation from the head.
+
+Each of the above relationships can also be recast in the time (waveform)
+
+124
diff --git a/pages-txt/137.txt b/pages-txt/137.txt
new file mode 100644
index 0000000..2448428
--- /dev/null
+++ b/pages-txt/137.txt
@@ -0,0 +1,57 @@
+The Klatt formant synthesizer
+
+P(f) = S()"T(f)"F(f)
+
+SOUND SOURCE
+VOICING
+ASPIRATION
+FRICATION
+
+VOCAL TRACT
+TRANSFER
+SOURCE FUNCTION LIP RADIATED
+VOLUME T(f) VOLUME SOUND
+VELOCITY VELOCITY PRESSURE
+
+S(f) u(®) | P(f)
+Figure 12-2: Components of the output spectrum of a speech sound
+
+RADIATION
+CHARACTERISTIC
+R(f)
+
+domain. This is actually how a waveform is generated in the computer. The syn-
+thesizer includes ‘components to simulate the generation of several different kinds
+of sound sources (described in Section 12.1.10), components to simulate the vocal
+
+tract transfer function (Figure 12-3), and a component to simulate sound radiation
+from the head (Figure 12-14).
+
+12.1.3 Cascade vs. parallel
+
+A number of hardware and software speech synthesizers have been described
+(Dudley et al., 1939; Cooper et al., 1951; Lawrence, 1953; Stevens et al., 1955;
+Fant, 1959; Fant and Martony, 1962; Flanagan et al., 1962; Holmes et al., 1964;
+Epstein, 1965; Tomlinson, 1966; Scott ef al., 1966; Liljencrants, 1968; Rabiner et
+al., 1971a; Klatt, 1972; Holmes, 1973). They employ different configurations to
+achieve what is hopefully the same result: high-quality approximation to human
+speech. A few of the synthesizers have stability and calibration problems, and a
+few have design deficiencies that make it impossible to synthesize a good voiced
+fricative, but many others have an excellent design. Of the best synthesizers that
+have been proposed, two general configurations are common.
+
+In one type of configuration, called a parallel formant synthesizer (see e.g.
+Lawrence, 1953; Holmes, 1973), the formant resonators that simulate the transfer
+function of the vocal tract are connected in parallel, as shown in the lower portion
+of Figure 12-3. Each formant resonator is preceded by an amplitude control that
+determines the relative amplitude of a spectral peak (formant) in the output
+spectrum for both voiced and voiceless speech sounds. In the second type of con-
+figuration, called a cascade formant synthesizer (see e.g. Fant, 1959; Klatt, 1972),
+sonorants are synthesized using a set of formant resonators connected in cascade,
+
+as shown in the upper part of Figure 12-3.
+The advantage of the cascade connection is that the relative amplitudes of for-
+
+mant peaks for vowels come out just right (Fant, 1956) without the need for in-
+
+125
diff --git a/pages-txt/138.txt b/pages-txt/138.txt
new file mode 100644
index 0000000..4a0de4c
--- /dev/null
+++ b/pages-txt/138.txt
@@ -0,0 +1,30 @@
+From text to speech: The MITalk system
+
+INPUT OUTPUT
+R4 R1
+
+CASCADE
+
+INPUT OUTPUT
+
+PARALLEL
+Figure 12-3: Parallel and cascade simulation of the vocal tract transfer function
+
+dividual amplitude controls for each formant. The disadvantage is that one still
+needs a parallel formant configuration for the generation of fricatives and plosive
+bursts -- the vocal tract transfer function cannot be modeled adequately when the
+sound source is above the larynx, so that cascade synthesizers are generally more
+complex in overall structure.
+
+A second advantage of the cascade configuration is that it is a more accurate
+model of the vocal tract transfer function during the production of nonnasal
+sonorants (Flanagan, 1957). It will be shown that the transfer functions of certain
+vowels cannot be modeled very well by a parallel formant synthesizer. Although
+not optimal, a parallel synthesizer is particularly useful for generating stimuli that
+violate the normal amplitude relationships between formants, or if one wishes to
+generate, e.g., single-formant patterns.
+
+The software simulation to be described has been programmed for normal use
+as a hybrid cascade/parallel synthesizer (Figure 12-4a), or alternatively for special-
+
+126
diff --git a/pages-txt/139.txt b/pages-txt/139.txt
new file mode 100644
index 0000000..1344feb
--- /dev/null
+++ b/pages-txt/139.txt
@@ -0,0 +1,53 @@
+The Klatt formant synthesizer
+
+purpose use as a strictly parallel synthesizer (Figure 12-4b). The experimenter
+must decide beforehand which configuration is to be employed. The change in
+configuration depends on the state of a single switch, and the program is smart
+enough to avoid performing unnecessary computations for resonators that are not
+used. To the extent possible, the synthesizer has been adjusted so as to generate
+about the same output waveform whether the cascade/parallel configuration or the
+all-parallel configuration is selected.
+
+VOICING
+SOURCE LARYNGEAL
+D TRANSFER FUNCTION
+
+(CASCADE)
+ASPIRATION RADIATION
+SOURCE D CHARACTERISTIC
+FRICATION
+
+TRANSFER FUNCTION
+FRICATION
+
+SOURCE
+
+(PARALLEL)
+
+OUTPUT
+SPEECH
+
+VOICING
+SOURCE
+
+I
+RISTI
+SOURCE (PARALLEL) CHARACTERISTIC
+
+ouTPUT
+SPEECH
+
+FRICATION
+SOURCE
+Figure 12-4: Cascade/parallel configurations supported by MITalk
+
+12.1.4 Waveform sampling rate
+
+Most of the sound energy of speech is contained in frequencies between about 80
+and 8000 Hz (Dunn and White, 1940). However, intelligibility tests of band-pass
+filtered speech indicate that intelligibility is not measurably changed if the energy
+in frequencies above about 5000 Hz is removed (French and Steinberg, 1947).
+Speech low-pass filtered in this way sounds perfectly natural. Thus we have
+selected 10,000 samples per second as the digital sampling rate of the synthesizer.
+
+127
diff --git a/pages-txt/140.txt b/pages-txt/140.txt
new file mode 100644
index 0000000..1b80a40
--- /dev/null
+++ b/pages-txt/140.txt
@@ -0,0 +1,51 @@
+From text to speech: The MITalk system
+
+12.1.5 Parameter update rate
+
+Control parameter values are updated every 5 msec. This is frequent enough to
+mimic even the most rapid formant transitions and brief plosive bursts. If desired,
+the program can be modified to update parameter values only every 10 msec with
+relatively little decrease in output quality.
+
+12.1.6 Digital resonators
+
+The basic building block of the synthesizer is a digital resonator having the
+properties illustrated in Figure 12-5. Two parameters are used to specify the input-
+output characteristics of a resonator, the resonant (formant) frequency F and the
+resonance bandwidth BW. In Figure 12-5, these values are 1000 Hz and 50 Hz,
+respectively. Samples of the output of a digital resonator, y(nT), are computed
+from the input sequence, x(nT), by the equation:
+
+y(nT)=Ax(nT)+By (nT-T)+Cy (nT-2T) (1)
+
+where y(nT-T) and y(nT-2T) are the previous two sample values of the output
+sequence y(nT). The constants A, B, and C are related to the resonant frequency F
+and the bandwidth BW of a resonator by the impulse-invariant transformation
+(Gold and Rabiner, 1968):
+
+C=—e2"BWT
+B=2¢"™8WTcos(2nfT) )
+A=1-B-C
+The constant T is the reciprocal of the sampling rate and equals 0.0001 seconds in
+the present 5-kHz simulation.
+
+The values of the resonator control parameters F and BW are updated every 5
+msec, causing the difference equation constants to change discretely in small steps
+every 5 msec as an utterance is synthesized. Large, sudden changes to these con-
+stants may introduce clicks and burps in the synthesizer output. Fortunately,
+acoustic theory indicates that formant frequencies must always change slowly and
+continuously, relative to the 5-msec update interval for control parameters.
+
+A digital resonator is a second-order difference equation. The transfer func-
+tion of a digital resonator has a sampled frequency response given by:
+
+A
+T(f)=————— 3)
+1-Bz-1-Cz2
+where z=62™7 j is the square root of —1, and f is frequency in Hz which ranges
+from 0 to 5000 Hz. The transfer function has a (sampled) impulse response iden-
+tical to a corresponding analog resonator circuit at sample times nT (Gold and
+Rabiner, 1968). But the frequency responses of an analog and digital resonator are
+
+not exactly the same, as seen in Figure 12-5.
+128
diff --git a/pages-txt/141.txt b/pages-txt/141.txt
new file mode 100644
index 0000000..1fb8dec
--- /dev/null
+++ b/pages-txt/141.txt
@@ -0,0 +1,46 @@
+The Klatt formant synthesizer
+
+DIGITAL RESONATOR
+
+OUTPUT
+
+INPUT SEQUENCE
+
+SEQUENCE
+
+r-- ---------------1
+[ S, meoao®amommmmmae wammo oo ol
+
++— T
+
+20
+
+!
+N
+
+o 0O
+S BW
+=
+=
+
+20 TR Digtal
+
+-40
+
+0 1 2 3 4 5
+Frequency (kHz)
+
+Figure 12-5: Block diagram and frequency response of a digital resonator
+
+12.1.7 Digital antiresonator
+An antiresonance (also called an antiformant or transfer-function zero pair) can be
+realized by slight modifications to these equations. The frequency response of an
+antiresonator is the mirror image of the response plotted in Figure 12-5 (i.e. replace
+dB by -dB). An antiresonator is used in the synthesizer to shape the spectrum of
+the voicing source and another is used to simulate the effects of nasalization in the
+cascade model of the vocal tract transfer function.
+
+The output of an antiformant resonator, y (nT'), is related to the input x(nT') by
+the equation:
+
+129
diff --git a/pages-txt/142.txt b/pages-txt/142.txt
new file mode 100644
index 0000000..6a7f882
--- /dev/null
+++ b/pages-txt/142.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+y(nT)=A"x(nT)+B 'x(nT-T )+C’x(nT-2T) @)
+
+where x (nT-T) and x(nT-2T) are the previous two samples of the input x(nT), the
+constants A”, B’ and C” are defined by the equations:
+
+A’=1/A
+B’=-B/A ©)
+C'=—C/A
+
+where A, B, and C are obtained by inserting the antiresonance center frequency F
+and bandwidth BW into Equation 2.
+
+12.1.8 Low-pass resonator
+
+As a special case, the frequency F of a digital resonator can be set to zero, produc-
+ing, in effect, a low-pass filter which has a nominal attenuation skirt of -12 dB per
+octave of frequency increase and a 3-dB down break frequency equal to BW/2.
+The voicing source contains a digital resonator RGP used as a low-pass filter that
+transforms a glottal impulse into a pulse having a waveform and spectrum similar
+to normal voicing. A second digital resonator, RGS, is used to low-pass filter the
+normal voicing waveform to produce the quasi-sinusoidal glottal waveform seen
+during the closure interval for an intervocalic voiced plosive.
+
+12.1.9 Synthesizer block diagram
+
+A block diagram of the synthesizer is shown in Figure 12-6. There are 39 control
+parameters that determine the characteristics of the output. The name and range of
+values for each parameter are given in Table 12-1. As seen from the table, as
+many as 22 of the 39 parameters are varied to achieve optimum matches to an ar-
+bitrary English utterance. The constant parameters in Table 12-1 have been given
+values appropriate for a particular male voice, and would have to be adjusted
+slightly to approximate the speech of other male or female talkers. The list of vari-
+able control parameters is long, compared with some synthesizers, but the em-
+phasis here is on defining strategies for the synthesis of high-quality speech. We
+are not concerned with searching for compromises that would minimize the infor-
+mation content in the control parameter specification.
+
+12.1.10 Sources of sound
+There are two kinds of sound sources that may be activated during speech produc-
+tion (Stevens and Klatt, 1974). One involves quasi-periodic vibrations of some
+structure, usually the vocal folds. Vibration of the vocal folds is called voicing.
+(Other structures such as the lips, tongue tip, or uvula may be caused to vibrate in
+sound types of some languages, but not in English.)
+
+The second kind of sound source involves the generation of turbulence noise
+
+130
diff --git a/pages-txt/143.txt b/pages-txt/143.txt
new file mode 100644
index 0000000..db40382
--- /dev/null
+++ b/pages-txt/143.txt
@@ -0,0 +1,30 @@
+The Klatt formant synthesizer
+
+CASCADE
+FO TRANSFER
+FUNCTION
+RANDOM
+NUMBER
+GEN.
+&\ Mop.
+%Y
+
+LPF
+
+Oplid
+(%e)
+
+PARALLEL TRANSFER FUNCTION
+Figure 12-6: Block diagram of the cascade/parallel formant synthesizer
+
+by the rapid flow of air past a narrow constriction. The resulting noise is called
+aspiration if the constriction is located at the level of the vocal folds as, for ex-
+ample, during the production of the sound Er. If the constriction is located above
+the larynx as, for example, during the production of sounds such as ss, the result-
+ing noise is called frication noise. The explosion of a plosive release also consists
+primarily of frication noise.
+
+When voicing and turbulence noise generation co-exist, as in a voiced frica-
+tive such as zz or a voiced Ha, the noise is amplitude modulated periodically by
+
+131
diff --git a/pages-txt/144.txt b/pages-txt/144.txt
new file mode 100644
index 0000000..415fa7c
--- /dev/null
+++ b/pages-txt/144.txt
@@ -0,0 +1,68 @@
+From text to speech: The MITalk system
+
+Table 12-1: List of control parameters for the software formant synthesizer
+
+N  Symbol Name Min. Max. Typ.
+1V AV Amplitude of voicing (dB) 0 80 0
+2 V AF Amplitude of frication (dB) 0O 80 0
+3V AH Amplitude of aspiration (dB) 0 80 0
+4 V AVS Amplitude of sinusoidal voicing (dB) 0O 80 0
+5V FO Fundamental freq. of voicing (Hz) 0 500 0
+6 V Fl1 First formant frequency (Hz) 150 900 500
+7V R Second formant frequency (Hz) 500 2500 1500
+8V F3 Third formant frequency (Hz) 1300 3500 2500
+OV F4 Fourth formant frequency (Hz) 2500 4500 3300
+
+10 VFNZ  Nasal zero frequency (Hz) 200 700 250
+
+11 V AN Nasal formant amplitude (dB) 0 80 0
+
+12 V Al First formant amplitude (dB) 0 80 0
+
+13 V A2 Second formant amplitude (dB) 0 0 0
+
+14 V A3 Third formant amplitude (dB) 0 80 0
+
+15V A4 Fourth formant amplitude (dB) 0 80 0
+
+16 V A5 Fifth formant amplitude (dB) 0 80 0
+
+17 V A6 Sixth formant amplitude (dB) 0 80 0
+
+183 V AB Bypass path amplitude (dB) 0 80 0
+
+19 V B1 First formant bandwidth (Hz) 40 500 50
+
+20 V B2 Second formant bandwidth (Hz) 40 500 70
+
+21 V B3 Third formant bandwidth (Hz) 40 500 110
+
+22 C SW Cascade/parallel switch Oc) 1(p) 0
+23 C FGP  Glottal resonator 1 frequency (Hz) 0 600 O
+24 C BGP Glottal resonator 1 bandwidth (Hz) 100 2000 100
+25 C FGZ Glottal zero frequency (Hz) 0 5000 1500
+26 C BGZ  Glottal zero bandwidth (Hz) 100 9000 6000
+27 C B4 Fourth formant bandwidth (Hz) 100 500 250
+28 V F5 Fifth formant frequency (Hz) 3500 4900 3850
+29 C BS Fifth formant bandwidth (Hz) 150 700 200
+30 C F6 Sixth formant frequency (Hz) 4000 4999 4900
+
+31 C B6 Sixth formant bandwidth (Hz) 200 2000 1000
+
+32 C FNP  Nasal pole frequency (Hz) 200 500 250
+
+33 C BNP Nasal pole bandwidth (Hz) 50 500 100
+
+34 C BNZ  Nasal zero bandwidth (Hz) 50 500 100
+
+35 C BGS Glottal resonator 2 bandwidth (Hz) 100 1000 200
+
+36 C SR Sampling rate (Hz). 5000 20000 10000
+
+37 C NWS  Number of waveform samples per chunk 1 200 50
+
+38 C GO Overall gain control (dB) 0O 80 48
+
+39 C NFC Number of cascaded formants 4 6 5
+
+132
diff --git a/pages-txt/145.txt b/pages-txt/145.txt
new file mode 100644
index 0000000..8a5378d
--- /dev/null
+++ b/pages-txt/145.txt
@@ -0,0 +1,54 @@
+The Klatt formant synthesizer
+
+the vibrations of the vocal folds. In addition, the vocal folds may vibrate without
+
+meeting in the midline. In this type of voicing, the amplitude of higher frequency
+
+harmonics of the voicing source spectrum is significantly reduced and the
+
+waveform looks nearly sinusoidal. Therefore, the synthesizer should be capable of
+
+generating at least two types of voicing waveforms (normal voicing and quasi- |
+sinusoidal voicing), two types of frication waveforms (normal frication and
+
+amplitude-modulated frication), and two types of aspiration (normal aspiration and
+
+amplitude-modulated aspiration). These are the only kinds of sound sources re-
+
+quired for English, although trills and clicks of other languages may call for the
+
+addition of other source controls to the synthesizer in the future.
+
+12.1.11 Voicing source
+
+The structure of the voicing source is shown at the top left in Figure 12-6. Vari-
+able control parameters are used to specify the fundamental frequency of voicing
+(FO), the amplitude of normal voicing (AV), and the amplitude of quasi-sinusoidal
+voicing (AVS).
+
+An impulse train corresponding to normal voicing is generated whenever FO
+is greater than zero. The amplitude of each impulse is determined by AV, the
+amplitude of normal voicing in dB. AV ranges from about 60 dB in a strong
+vowel to 0 dB when the voicing source is turned off. Fundamental frequency is
+specified in Hz; a value of FO=100 would produce a 100-Hz impulse train. The
+number of samples between impulses, TO, is determined by SR/FO0, e.g., for a sam-
+pling rate of 10,000 and a fundamental frequency of 200 Hz, an impulse is
+generated every 50th sample. Under some circumstances, the quantization of the
+fundamental period to be an integral number of samples might be perceived in a
+slow, prolonged fundamental frequency transition as a sort of staircase of mechani-
+- cal sounds (similar to the rather unnatural speech one gets by setting FO to a con-
+stant value in a synthetic utterance). But the problem is not sufficiently serious to
+merit running the source model of the synthesizer at a higher sampling rate. If
+desired, some aspiration noise can be added to the normal voicing waveform to
+partially alleviate the problem and create a somewhat breathy voice quality.
+
+12.1.12 Normal voicing
+Ignoring for the moment the effects of RGZ, we see that the train of impulses is
+
+sent through a low-pass filter, RGP, to produce a smooth waveform that resembles
+a typical glottal volume velocity waveform (Flanagan, 1958). The resonator fre-
+quency FGP is set to 0 Hz and BGP to 100 Hz. The filtered impulses thus have a
+spectrum that falls off smoothly at approximately -12 dB per octave above 50 Hz.
+The waveform generated does not have the same phase spectrum as a typical glot-
+
+133
diff --git a/pages-txt/146.txt b/pages-txt/146.txt
new file mode 100644
index 0000000..d18fdc0
--- /dev/null
+++ b/pages-txt/146.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+tal pulse, nor does it contain spectral zeros of the kind that often appear in natural
+voicing, but neither of these differences is judged to be very important percep-
+tually.
+
+The antiresonator RGZ is used to modify the detailed shape of the spectrum
+of the voicing source for particular individuals with greater precision than would
+be possible using only a single low-pass filter. The values chosen for FGZ and
+BGZ in Table 12-1 are such as to tilt the general voicing spectrum up somewhat to
+match the vocal characteristics of speaker DHK. The waveform and spectral en-
+
+velope of normal voicing that are produced by sending an impulse train through
+RGP and RGZ are shown in Figure 12-7.
+
+12.1.13 Quasi-sinusoidal voicing
+
+The amplitude control parameter AVS determines the amount of smoothed voicing
+generated during voiced fricatives, voiced aspirates, and the voicebars present in
+intervocalic voiced plosives. An appropriate wave shape for quasi-sinusoidal voic-
+ing is obtained by low-pass filtering an impulse by low-pass digital resonators
+RGP and RGS. The frequency control of RGS is set to zero to produce a low-pass
+filter, and BGS=200 determines the cutoff frequency beyond which harmonics are
+strongly attenuated.
+
+The waveform and spectral envelope of quasi-sinusoidal voicing are shown in
+Figure 12-7. After the effects of the vocal tract transfer function and radiation
+characteristic are imposed on the source spectrum, the output waveform of quasi-
+sinusoidal voicing contains significant energy only at the first and second har-
+monics of the fundamental frequency. AVS ranges from about 60 dB in a voicebar
+or strongly voiced fricative to O dB if no quasi-sinusoidal voicing is present. Some
+degree of quasi-sinusoidal voicing can be added to the normal voicing source (in
+
+combination with aspiration noise) to produce a breathy voice quality (e.g.
+AH=AV-3, AVS=AV-6).
+
+12.1.14 Frication source
+
+A turbulent noise source is simulated in the synthesizer by a pseudo-random num-
+ber generator, a modulator, an amplitude control AF, and a -6 dB/octave low-pass
+digital filter LPF, as shown in Figure 12-6. Theoretically, the spectrum of the
+frication source should be approximately flat (Stevens, 1971), and the amplitude
+distribution should be Gaussian. Signals produced by the random number gener-
+ator have a flat spectrum, but they have a uniform amplitude distribution between
+limits determined by the value of the amplitude control parameter AF. A pseudo-
+Gaussian amplitude distribution is obtained in the synthesizer by summing 16 of
+the numbers produced by the random number generator.
+
+134
diff --git a/pages-txt/147.txt b/pages-txt/147.txt
new file mode 100644
index 0000000..0fa7754
--- /dev/null
+++ b/pages-txt/147.txt
@@ -0,0 +1,25 @@
+The Klatt formant synthesizer
+
+a) Normal voicing waveform
+
+A VAVAVAN
+
+b) Smoothed voicing waveform
+
+a) -12 dB per octave
+
+Spectral envelope (dB)
+A
+o
+
+b) -24 dB per octave
+
+-70
+-80
+0 1 2 3 4 5
+Frequency (kHz)
+¢) Voicing magnitude spectra
+
+Figure 12-7: Four periods from voicing waveforms
+
+135
diff --git a/pages-txt/148.txt b/pages-txt/148.txt
new file mode 100644
index 0000000..135402e
--- /dev/null
+++ b/pages-txt/148.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+In theory, the noise source is an ideal pressure source. The volume velocity
+of the frication noise depends on the impedance seen by the noise source. Since
+the vocal tract transfer function T'(f) relates source volume velocity to lip volume
+velocity, one must estimate noise volume velocity to determine lip output. In the
+general case, this is a complex calculation, but we will assume that source volume
+velocity is proportional to the integral of source pressure (an excellent approxima-
+tion for a frication source at the lips because the radiation impedance is largely in-
+ductive, but only an approximation for other source locations). The integral is ap-
+proximated by a first-order low-pass digital filter LPF that is shown in Figure 12-6.
+Output samples from this filter y(nT) are related to the input sequence x(nT) by
+the equation:
+
+y(nT)=x(nT)+y(nT-T)
+
+It will be seen later that, the radiation characteristic is a digital high-pass filter
+that exactly cancels out the effects of LPF. (For computational efficiency, the
+radiation characteristic can be moved into the voicing source circuit and the low-
+pass filter LPF can be removed from the noise source.)
+
+An example of synthetic frication noise volume velocity that was generated in
+this way is shown in Figure 12-8. The spectrum of this sample of noise fluctuates
+randomly about the expected long-term average noise spectrum (dashed curve -
+shifted up by 10 dB for clarity). Short samples of noise vary in their spectral
+properties due to the nature of random processes.
+
+The output of the random number generator is amplitude modulated by the
+component labeled “MOD” in Figure 12-6 whenever the fundamental frequency
+FO and the amplitude of voicing AV are both greater than zero. Voiceless sounds
+(AV=0) are not amplitude modulated because the vocal folds are spread and stif-
+fened, and do not vibrate to modulate the airflow. The degree of amplitude
+modulation is fixed at 50 percent in the synthesizer. The modulation envelope is a
+square wave with a period equal to the fundamental period. Experience has shown
+that it is not necessary to vary the degree of amplitude modulation over the course
+of a sentence, but only to ensure that it is present in voiced fricatives and voiced
+aspirated sounds.
+
+The amplitude of the frication noise is determined by AF, which is given in
+dB. A value of 60 will generate a strong frication noise, while a value of zero ef-
+fectively turns off the frication source.
+
+12.1.15 Aspiration source
+Aspiration noise is essentially the same as frication noise, except that it is
+generated in the larynx. In a strictly parallel vocal tract model, AF can be used to
+
+136
diff --git a/pages-txt/149.txt b/pages-txt/149.txt
new file mode 100644
index 0000000..fcd58b0
--- /dev/null
+++ b/pages-txt/149.txt
@@ -0,0 +1,48 @@
+The Klatt formant synthesizer
+
+60
+
+50 s,
+
+H
+o
+
+\~
+-
+-
+-~
+-~
+-
+- -
+-
+-
+-----
+hadl X,
+
+N
+o
+
+Magnitude spectrum (dB)
+W
+o
+
+10
+
+0 1 2 3 4 5
+Frequency (kHz)
+
+Figure 12-8: Waveform segment and magnitude spectrum of frication noise
+
+generate both frication and aspiration noise. However, in the cascade synthesizer
+configuration, aspiration noise is sent through the cascade vocal tract model (since
+the cascade configuration is specially designed to model vocal tract characteristics
+for laryngeal sound s'(')im:'es'), while fricatives require a parallel vocal tract con-
+figuration. Therefore separate amplitude controls are needed for frication and
+aspiration in a cascade/parallel configuration. The amplitude of aspiration noise
+sent to the cascade vocal tract model is determined by AH, which is given in dB.
+A value of 60 will generate strong aspiration, while a value of zero effectively
+turns off the aspiration source. Since frication and aspiration are generated by an
+identical process in the synthesizer, Figure 12-8 describes the characteristics of the
+
+aspiration source as well.
+137
diff --git a/pages-txt/150.txt b/pages-txt/150.txt
new file mode 100644
index 0000000..3ce59b5
--- /dev/null
+++ b/pages-txt/150.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+12.1.16 Pitch-synchronous updating of voicing source amplitudes
+
+The voicing source amplitude controls AV and AVS only have an effect on the
+synthetic waveform when a glottal impulse is issued. The reason for adjusting
+voicing amplitudes discontinuously at the onset of each glottal period is to prevent
+the creation of pops and clicks due to waveform discontinuities introduced by the
+sudden change in an amplitude control in the middle of a voicing period.
+
+12.1.17 Generation of plosive bursts with a predictable spectrum
+
+The noise amplitudes AF and AH are used to interpolate the intensity of the noise
+sources linearly over the 5 msec (50 sample) interval. (Thus there is a 5 msec
+delay in the attainment of a new amplitude value for a noise source.) Interpolation
+permits a more gradual onset for a fricative or HH than would otherwise be pos-
+sible. There is, however, one exception to this internal control strategy. A plosive
+burst involves a more rapid source onset than can be achieved by 5 msec linear
+interpolation. Therefore, if AF increases by more than 50 dB from its value
+specified in the previous 5 msec segment, AF is (automatically) changed instan-
+taneously to its new target value. The pseudo-random number generator is also
+reset at the time of plosive burst onset so as to produce exactly the same source
+waveform for each burst. The value to which it is set was chosen so as to produce
+as a burst spectrum as flat as possible.
+
+12.1.18 Control of fundamental frequency
+
+At times, it is desired to specify precisely the timing of the first glottal pulse
+(voicing onset) relative to a plosive burst. For example, in the syllable pa, it might
+be desired to produce a 5 msec burst of frication noise, 40 msec of aspiration noise,
+and voicing onset exactly 45 msec from the onset of the burst. Usually, a glottal
+pulse is issued in the synthesizer at a time specified by the reciprocal of the value
+of the fundamental frequency control parameter extant when the last glottal pulse
+was issued. However, if either AV or FO is set to zero, no glottal pulse is issued
+during this 5 msec time interval; in fact, no glottal pulses are issued until precisely
+the moment that both the AV and FO control parameters become nonzero. In the
+case of the pa example above, both AV and FO would normally be set to zero
+during the closure interval, burst, and aspiration phase; and AV would be set to
+about 60 dB and FO to about 130 Hz at exactly 45 msec after the synthetic burst
+onset.
+
+Since the update interval in the synthesizer is set to 5 msec, voice onset time
+can be specified exactly in 5 msec steps. If greater precision is needed, it is neces-
+sary to change the parameter update interval from 5 msec (NWS=50) to, for ex-
+ample, 2 msec (NWS=20).
+
+138
diff --git a/pages-txt/151.txt b/pages-txt/151.txt
new file mode 100644
index 0000000..c58e91c
--- /dev/null
+++ b/pages-txt/151.txt
@@ -0,0 +1,48 @@
+The Klatt formant synthesizer
+
+12.2 Vocal tract transfer functions
+
+The acoustic characteristics of the vocal tract are determined by its cross-sectional
+area as a function of distance from the larynx to the lips. The vocal tract forms a
+nonuniform transmission line whose behavior can be determined for frequencies
+below about 5 kHz by solving a one-dimensional wave equation (Fant, 1960).
+(Above 5 kHz, three-dimensional resonance modes would have to be considered.)
+Solutions to the wave equation result in a transfer function that relates samples of
+the glottal source volume velocity to output volume velocity at the lips.
+
+The synthesizer configuration in Figure 12-6 includes components to realize
+two different types of vocal tract transfer function. The first, a cascade configura-
+tion of digital resonators, models the resonant properties of the vocal tract when-
+ever the source of sound is within the larynx. The second, a parallel configuration
+of digital resonators and amplitude controls, models the resonant properties of the
+vocal tract duﬁng the production of frication noise. The parallel configuration can
+also be used to model vocal tract characteristics for laryngeal sound sources, al-
+though the approximation is not quite as good as in the cascade model.
+
+12.2.1 Cascade vocal tract model
+
+Assuming that the one-dimensional wave equation is a valid approximation below
+5 kHz, the vocal tract transfer function can be represented in the frequency domain
+by a product of poles and zeros. Furthermore, the transfer function contains only
+about five complex pole pairs and no zeros in the frequency range of interest, as
+long as the articulation is nonnasalized and the sound source is at the larynx (Fant,
+1960). The transfer function conforms to an all-pole model because there are no
+side-branch resonators or multiple sound paths. (The glottis is partially open
+during the production of aspiration so that the poles and zeros of the subglottal sys-
+tem are often seen in aspiration spectra; the only way to approximate their effects
+in the synthesizer is to increase the first formant bandwidth to about 300 Hz. The
+perceptual importance of the remaining spectral distortions caused by the poles and
+
+zeros of the subglottal system is probably minimal.)
+Five resonators are appropriate for simulating a vocal tract with a length of
+
+about 17 cm, the length of a typical male vocal tract, because the average spacing
+between formants is equal to the velocity of sound divided by half the wavelength,
+which works out to be 1000 Hz. A typical female vocal tract is 15 to 20 percent
+shorter, suggesting that only four formant resonators be used to represent a female
+voice in a 5 kHz simulation (or that the simulation should be extended to about 6
+
+kHz). It is suggested that the voices of women and children be approximated by
+setting the control parameter NFC to 4, thus removing the fifth formant from the
+
+139
diff --git a/pages-txt/152.txt b/pages-txt/152.txt
new file mode 100644
index 0000000..c3983fc
--- /dev/null
+++ b/pages-txt/152.txt
@@ -0,0 +1,49 @@
+From text to speech: The MITalk system
+
+cascade branch of the block diagram shown in Figure 12-6. For a male talker with
+a very long vocal tract, it may be necessary to add a sixth resonator to the cascade
+branch. As currently programmed, NFC can be set to 4, 5, or 6 formants in the
+cascade branch. (Any change to NFC implies a change in the length of the vocal
+tract, so such changes must be made with care.)
+
+Ignoring for the moment the nasal pole resonator RNP and the nasal zero an-
+tiresonator RNZ, the cascade model of Figure 12-6, consisting of five formant
+resonators, has a volume velocity transfer function that can be represented in the
+frequency domain as a product (Gold and Rabiner, 1968):
+
+A(n)
+T | ! 6
+()= 1-B(n)z"1-C (n)z2 ©)
+
+where the constants A(n), B(n), and C(n) are determined by the values of the nth
+formant frequency F(n) and nth formant bandwidth BW(n) by the relationships
+given earlier in Equation 2. The constants A(n) in the numerator of Equation 6
+ensure that the transfer function has a value of unity at zero frequency, i.e., the dc
+airflow is unimpeded. The magnitude of T'(f) is plotted in Figure 12-9 for several
+values of formant frequencies and formant bandwidths.
+
+12.2.2 Relationship to analog models of the vocal tract
+
+The transfer function of the vocal tract can also be expressed in the continuous
+world of differential equations. Equation 6 is then rewritten as an infinite product
+of poles in the Laplace transform s-plane:
+
+s(n)s*(n)
+9= I_]l: [s+s(n)][s+s*(n)] ™)
+
+where s=2jnf, and the constants s(n) and s*(n) are determined by the values of the
+nth formant frequency F (n) and the nth formant bandwidth BW(n) by the relation-
+ships:
+
+s(n)=nBW(n)+2jrF (n)
+s*(n)=nBW(n)-2jrF (n)
+
+The two formulations 6 and 7 are exactly equivalent representations of the
+transfer function for an ideal vocal tract configuration corresponding to a uniform
+tube closed at the glottis and having all formant bandwidths equal to, e.g., 100 Hz.
+The two formulations are indistinguishable at representing vocal tract transfer
+functions below 5 kHz. However, in a practical synthesizer, the infinite product of
+poles can only be approximated (e.g. by building five electronic resonators and a
+higher-pole correction network (Fant, 1959)).
+
+140
diff --git a/pages-txt/153.txt b/pages-txt/153.txt
new file mode 100644
index 0000000..e1463b9
--- /dev/null
+++ b/pages-txt/153.txt
@@ -0,0 +1,53 @@
+The Klatt formant synthesizer
+
+20 20
+
+Uniform tube 1Y
+
+10 10
+)
+=)
+
+= 0 0
+Ll
+
+-10 -10
+
+-20 -20
+
+0 1 2 3 4 5 0 1 2 3 4 5
+Frequency (kHz) Frequency (kHz)
+30 10
+AA UW
+
+20 0
+Q
+2
+=
+
+0 -20
+
+-10 -30
+
+0 1 2 3 4 5 0 1 2 3 4 5
+Frequency (kHz) Frequency (kHz)
+
+Figure 12-9: Magnitude of the vocal tract transfer function
+
+12.2.3 Formant frequencies
+
+Each formant resonator introduces a peak in the magnitude spectra shown in
+Figure 12-9. The frequency of formant peak “n” is determined by the formant fre-
+quency control parameter Fn. (The amplitude of a formant peak depends not only
+on Fn and the formant bandwidth control parameter BWn, but also on the fre-
+quencies of the other formants, which will be discussed below.)
+
+Formant frequency values are determined by the detailed shape of the vocal
+tract. Formant frequency values associated with different phonetic segments in the
+speech of speaker DHK are presented in Chapter 11. The frequencies of the lowest
+three formants vary substantially with changes to articulation (e.g. the observed
+range of F1 is from about 180 to 750 Hz, of F2 is 600 to 2300 Hz, and of F3 is
+1300 to 3100 Hz for a typical male talker). The frequencies and bandwidths of the
+fourth and fifth formant resonators do not vary as much, and could be held con-
+
+141
diff --git a/pages-txt/154.txt b/pages-txt/154.txt
new file mode 100644
index 0000000..7701308
--- /dev/null
+++ b/pages-txt/154.txt
@@ -0,0 +1,48 @@
+From text to speech: The MITalk system
+
+stant with little decrease in output sound quality. These higher frequency
+resonators help to shape the overall spectrum, but otherwise contribute little to in-
+telligibility. The particular values chosen for the fourth and fifth formant fre-
+quencies (Table 12-1) produce an energy concentration around 3 to 3.5 kHz and a
+rapid falloff in spectral energy above about 4 kHz, which is a pattern typical of
+
+many talkers.
+
+12.2.4 Formant bandwidths
+
+Formant bandwidths are a function of energy losses due to heat conduction, vis-
+cosity, cavity-wall motions, and radiation of sound from the lips and the real part
+of the glottal source impedance. Bandwidths are difficult to deduce from analyses
+of natural speech because of irregularities in the glottal source spectrum.
+Bandwidths have been estimated by other techniques, such as using a sinusoidal
+swept-tone sound source (Fujimura and Lindqvist, 1971). Results indicate that
+bandwidths vary by a factor of two or more as a function of the particular phonetic
+segment being spoken. Typical values for formant bandwidths are also given in
+Chapter 11. Bandwidth variation is small enough so that all formant bandwidths
+might be held constant in some applications, in which case only F1, F2, and F3
+would be varied to simulate the vocal tract transfer functions for nonnasalized
+vowels and sonorant consonants.
+
+12.2.5 Nasals and nasalization of vowels
+
+It is not possible to approximate nasal murmurs and the nasalization of vowels that
+are adjacent to nasals with a cascade system of five resonators alone. More that
+five formants are often present in these sounds and formant amplitudes do not con-
+form to the relationships inherent in a cascade configuration because of the
+presence of transfer function zeros (Fujimura, 1961, 1962). Typical transfer func-
+tions for a nasal murmur and for a nasalized 11 are shown in Figure 12-10. These
+spectra were obtained from the recorded syllable “dim”.
+
+Nasalization introduces additional poles and zeros into the transfer function of
+the vocal-nasal tract due to the presence of a side-branch resonator. In Figure
+12-10, the nasal murmur and the nasalized 1H have an extra pole pair and zero pair
+near F1. The oral cavity forms the side-branch resonator in the case of a nasal
+murmur, while the nose should be considered a side-branch resonator in a nasal-
+ized vowel (because the amount of sound radiated through the nostrils is insig-
+nificant compared to the effect of the lowered velum on the formant structure of
+the sound output from the lips).
+
+Nasalization of adjacent vowels is an important element in the synthesis of
+nasal consonants. Perceptually, the most important change associated with
+
+142
diff --git a/pages-txt/155.txt b/pages-txt/155.txt
new file mode 100644
index 0000000..ebff984
--- /dev/null
+++ b/pages-txt/155.txt
@@ -0,0 +1,31 @@
+The Klatt formant synthesizer
+60
+IH
+
+50
+
+40
+
+30
+
+IT()] (dB)
+
+20
+
+10 |
+
+60
+
+IH
+>0 (nasalized)
+
+[T(f)] (dB)
+
+[T (dB)
+
+IL. b ‘A .
+
+Frequency (kHz)
+Figure 12-10: Nasalization of the vowel IH in the syllable “dim”
+
+143
diff --git a/pages-txt/156.txt b/pages-txt/156.txt
new file mode 100644
index 0000000..bd4b265
--- /dev/null
+++ b/pages-txt/156.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+nasalization of a vowel is the reduction in amplitude of the first formant, brought
+on by the presence of a nearby low-frequency pole pair and zero pair. The first
+formant frequency also tends to shift slightly toward about 500 Hz.
+
+Nasal murmurs and vowel nasalization are approximated by the insertion of
+an additional resonator RNP and antiresonator RNZ into the cascade vocal tract
+model. The nasal pole frequency FNP and zero frequency FINZ should be set to a
+fixed value of about 250 Hz, but the frequency of the nasal zero must be increased
+during the production of nasals and nasalization. Strategies for controlling FNZ
+are given in Chapter 11. The RNP-RNZ pair is effectively removed from the cas-
+cade circuit during the synthesis of nonnasalized speech sounds if FNP=FNZ.
+
+12.2.6 Parallel vocal tract model for frication sources
+
+During frication excitation, the vocal tract transfer function contains both poles
+and zeros. The pole frequencies are temporally continuous with formant locations
+of adjacent phonetic segments because, by definition, the poles are the natural
+resonant frequencies of the entire vocal tract configuration, no matter where the
+source is located. Thus, the use of vocalic formant frequency parameters to control
+the locations of frication maxima is theoretically well-motivated (and helpful in
+preventing the fricative noises from “dissociating” from the rest of the speech
+signal).
+
+The zeros in the transfer function for fricatives are the frequencies for which
+the impedance (looking back toward the larynx from the position of the frication
+source) is infinite, since the series-connected pressure source of turbulence noise
+cannot produce any output volume velocity under these conditions. The effect of
+transfer-function zeros is two-fold; they introduce notches in the spectrum and they
+modify the amplitudes of the formants. The perceptual importance of spectral
+notches is not great because masking effects of adjacent harmonics limit the detec-
+tability of a spectral notch (Gauffin and Sundberg, 1974). We have found that a
+satisfactory approximation to the vocal tract transfer function for frication excita-
+tion can be achieved with a parallel set of digital formant resonators having
+amplitude controls, and no antiresonators.
+
+Formant amplitudes are set to provide frication excitation for selected for-
+mants, usually those associated with the cavity in front of the constriction
+
+(Stevens, 1972). The presence of any transfer function zeros is accounted for by
+| appropriate settings of the formant amplitude controls. Relatively simple rules for
+determination of the formant amplitude settings (and bypass path amplitude
+values) as a function of place of articulation can be derived from a quantal theory
+of speech production (Stevens, 1972). The theory states that only formants as-
+
+144
diff --git a/pages-txt/157.txt b/pages-txt/157.txt
new file mode 100644
index 0000000..acfd4f3
--- /dev/null
+++ b/pages-txt/157.txt
@@ -0,0 +1,48 @@
+The Klatt formant synthesizer
+
+sociated with the cavity in front of the oral constriction are strongly excited. The
+theory is supported by the formant amplitude specifications for fricatives and
+plosive bursts presented in Chapter 11. These amplitude control data were derived
+from attempts to match natural frication spectra.
+
+There are six formant resonators in the parallel configuration of Figure 12-6.
+A sixth formant has been added to the parallel branch specifically for the synthesis
+of very-high-frequency noise in ss, zz. The main energy concentration in these
+alveolar fricatives is centered on a frequency of about 6 kHz. This is above the
+highest frequency (5 kHz) that can be synthesized in a 10,000 sample/second
+simulation. However, in a ss, there is gradually increasing frication noise in the
+frequencies immediately below 5 kHz due to the low-frequency skirt of the 6 kHz
+formant resonance, and this noise spectrum can be approximated quite well by a
+resonator positioned at about 4900 Hz. We have found it better to include an extra
+resonator to simulate high-frequency noise than to move F5 up in frequency when-
+ever a sibilant is to be synthesized, because clicks and moving energy concentra-
+tions are thereby avoided.
+
+Also included in the parallel vocal tract model is a bypass path. The bypass
+path with amplitude control AB is present because the transfer function contains
+no prominent resonant peaks during the production of FFr, vv, TH, and DH, and
+the synthesizer should include a means of bypassing all of the resonators to
+produce a flat transfer function.
+
+During the production of a voiced fricative, there are two active sources of
+sound, one located at the glottis (voicing) and one at a constriction in the vocal
+tract (frication). The output of the quasi-sinusoidal voicing source is sent through
+the cascade vocal tract model, while the frication source excites the parallel branch
+to generate a voiced fricative.
+
+12.2.7 Simulation of the cascade configuration by the parallel configuration
+
+The transfer function of the laryngeally excited vocal tract can also be ap-
+proximated by five digital formant resonators connected in parallel. The same
+resonators that form the parallel branch for frication excitation can be used if
+suitable values are chosen for the formant amplitude controls.
+
+The following rules summarize what happens to formant amplitudes in the
+transfer function T(f) of a cascade model as the lowest five formant frequencies
+and bandwidths are changed. These relationships follow directly from Equation 6,
+under the assumption that each formant frequency F(n) is at least five to ten times
+as large as the formant bandwidth BW(n):
+
+1. The formant peaks in the transfer function are equal for the case
+
+145
diff --git a/pages-txt/158.txt b/pages-txt/158.txt
new file mode 100644
index 0000000..8b330de
--- /dev/null
+++ b/pages-txt/158.txt
@@ -0,0 +1,62 @@
+From text to speech: The MITalk system
+
+BW2=100 BW2=50
+
+20 Uniform tube 20 \ BW2=200
+10 10 /
+0 0
+-10
+1 2 3 4 5 0 1 2 3 4 5
+) (b)
+
+0
+(a
+
+o
+o
+
+Transfer function |T(f)| (dB)
+(O}
+o
+
+30
+20 ' F1=500 20 ‘
+
+10 10 / ”\ 3
+0 0 T
+-10 250 -10 2
+
+-20 -20
+0 1 2 3 4 5 0 1 2 3 4 5
+
+Frequency (kHz) Frequency (kHz)
+(c) (d)
+
+Figure 12-11: Effect of parameter changes on the vocal tract transfer function
+
+where formant frequencies are set to 500, 1500, 2500, 3500, and
+4500 Hz and formant bandwidths are set to be equal at 100 Hz. This
+corresponds to a vocal tract having a uniform cross-sectional area, a
+closed glottis, open lips (and a nonrealistic set of bandwidth values),
+as shown in part (a) of Figure 12-11.
+
+2. The amplitude of a formant peak is inversely proportional to its
+bandwidth. If a formant bandwidth is doubled, that formant peak is
+reduced in amplitude by 6 dB. If the bandwidth is halved, the peak is
+increased by 6 dB, as shown in part (b) of Figure 12-11.
+
+3. The amplitude of a formant peak is proportional to formant fre-
+quency. If a formant frequency is doubled, that formant peak is in-
+creased by 6 dB, as shown in part (c) of Figure 12-11. (This is true
+of T(f), but not of the resulting speech output spectrum since the
+glottal source spectrum falls off at about -12 dB/octave of frequency
+increase, and the radiation characteristic imposes a +6 dB/octave
+spectral tilt resulting in a net change in formant amplitude of +6 -12
++6 =0dB.)
+
+4. Changes to a formant frequency also affect the amplitudes of higher
+formant peaks by a factor proportional to frequency squared. For ex-
+
+146
+
+)
diff --git a/pages-txt/159.txt b/pages-txt/159.txt
new file mode 100644
index 0000000..ec2f5e4
--- /dev/null
+++ b/pages-txt/159.txt
@@ -0,0 +1,51 @@
+The Klatt formant synthesizer
+
+ample, if a formant frequency is halved, amplitudes of all higher for-
+
+mants are decreased by 12 dB, i.e. (.5)?, as shown in part (c) of
+Figure 12-11.
+
+5. The frequencies of two adjacent formants cannot come any closer
+
+than about 200 Hz because of coupling between the natural modes of
+
+- the vocal tract. However, if two formants approach each other by
+
+about this amount, both formant peaks are increased by an additional
+
+3 to 6 dB, as shown in part (d) of Figure 12-11.
+
+The amplitudes of the formant peaks generated by the parallel vocal tract
+model have been constrained so that, if Al to A5 are all set to 60 dB, the transfer
+function will approximate that found in the cascade model. This is accomplished
+by: 1) adjusting the gain of the higher frequency formants to take into account
+frequency changes in lower formants (since a higher formant rides on the skirts of
+the transfer function of all lower formants in a cascade model (Fant, 1960)), 2) in-
+corporating rules to cause formant amplitudes to increase whenever two formant
+frequencies come into proximity, and 3) using a first difference calculation to
+remove low-frequency energy from the higher formants; this energy would other-
+wise distort the spectrum in the region of F1 during the synthesis of some vowels
+(Holmes, 1973).
+
+- 'The magnitude of the vocal tract transfer functions of the cascade and parallel
+vocal tract models are compared in Figure 12-12 for several vowels. The match is
+quite good in the vicinity of formant peaks, but the parallel model introduces trans-
+fer function zeros (notches) in the spectrum between formant peaks. The notches
+are of relatively little perceptual importance because energy in the formant peak
+adjacent to the notch on the low-frequency side tends to mask the detectability of a
+spectral notch (Gauffin and Sundberg, 1974).
+
+Many early parallel synthesizers were programmed to add together formant
+outputs without filtering out the energy at low frequencies from resonators other
+than F1. In other cases, formant outputs were combined in alternating signs. The
+deleterious effects of these choices are illustrated in Figure 12-13. Some vowel
+spectra are poorly modeled in both of these parallel methods of synthesis. The per-
+ceptual degradation is less in the alternating sign case because spectral notches are
+less perceptible than energy-fill in a spectral valley between two formants. Com-
+parison of Figure 12-12 and Figure 12-13 indicates that our parallel configuration
+is better than either of those shown in Figure 12-13.
+
+A nasal formant resonator RNP appears in the parallel branch to assist in the
+approximation of nasal murmurs and vowel nasalization when the cascade branch
+
+147
diff --git a/pages-txt/160.txt b/pages-txt/160.txt
new file mode 100644
index 0000000..20ce3d5
--- /dev/null
+++ b/pages-txt/160.txt
@@ -0,0 +1,41 @@
+From text to speech: The MITalk system
+
+50 50
+1Y 1Y
+40 40
+g 30 | 30 |
+E .
+= 20 20
+10 ¢ 10 |
+0 0 .
+0 1 2 3 4 5 0 1 2 3 4
+50 50
+aA AA
+40 40
+g 30 30
+=
+10 10
+0 0
+0 1 2 3 4 5 0 1 2 3 4
+50 50
+uw uw
+40 40
+g 30 30
+=
+= 20 20
+10 10
+0 0
+0 1 2 3 4 5 0 1 2 3 4
+0 Uniform 50 Uniform
+40 tract 49 tract
+g 30 30
+=
+10 10 |
+0 0
+0 1 2 3 4 5 0 1 2 3 4
+Frequency (kHz) Frequency (kHz)
+Cascade Parallel
+
+Figure 12-12: Preemphasized output spectra from cascade and parallel models
+
+148
diff --git a/pages-txt/161.txt b/pages-txt/161.txt
new file mode 100644
index 0000000..4425b4e
--- /dev/null
+++ b/pages-txt/161.txt
@@ -0,0 +1,85 @@
+[F(N (dB)
+
+1T(f)| (dB)
+
+[T(D4 (dB)
+
+IT(NI (dB)
+
+50
+40
+
+30 -
+20
+10
+
+50
+40
+30
+20
+10
+
+50
+40
+
+30
+20
+10
+
+2
+
+Parallel
+
+3
+
+Frequency (kHz)
++-+
+
+Uniform
+tract
+
+The Klatt formant synthesizer
+
+50
+IY
+
+40
+30
+20
+10 |
+
+0
+
+0 1 2 3 4 5
+
+50
+40
+30
+20 |
+10 |
+
+0
+0 1 2 3 4 5
+
+50
+Uw
+40
+30
+20
+10
+0
+0 1 2 3 4 5
+S0 Uniform
+40 tract
+30
+20
+10
+0
+0 1 2 3 4 5
+
+Frequency (kHz)
+Parallel +++
+
+Figure 12-13: Spectra from two different parallel synthesis configurations
+
+149
diff --git a/pages-txt/162.txt b/pages-txt/162.txt
new file mode 100644
index 0000000..5366a20
--- /dev/null
+++ b/pages-txt/162.txt
@@ -0,0 +1,30 @@
+From text to speech: The MITalk system
+
+is not used. Neither the parallel nasal formant nor the parallel first formant
+resonator is needed in the normal cascade/parallel synthesizer configuration
+(SW=0), but they are required for the simulation of nasalization in the special-
+purpose all-parallel configuration (SW=1).
+
+12.3 Radiation characteristic |
+
+The box labeled "radiation characteristic" in Figure 12-6 models the effect of
+directivity patterns of sound radiating from the head as a function of frequency.
+The sound pressure measured directly in front of and about a meter from the lips is
+proportional to the temporal derivative of the lip-plus-nose volume velocity, and
+inversely proportional to r, the distance from the lips (Fant, 1960). The transfor-
+mation is simulated in the synthesizer by taking the first difference of lip-nose
+volume velocity:
+
+p(nT)=u(nT)—u(nT-T) ®
+
+The radiation characteristic adds a gradual rise in the overall spectrum, as
+shown in Figure 12-14,
+
+|R(f)] (dB)
+
+0 1 2 3 4 5
+Frequency (kHz)
+
+Figure 12-14: Transfer function of the radiation characteristic
+
+150
diff --git a/pages-txt/163.txt b/pages-txt/163.txt
new file mode 100644
index 0000000..6441e60
--- /dev/null
+++ b/pages-txt/163.txt
@@ -0,0 +1,38 @@
+13
+
+Some measures of intelligibility and comprehensionl
+
+13.1 Overview
+
+As the ten year effort to build an unrestricted text-to-speech system at MIT drew to
+a close, it seemed appropriate to conduct a preliminary evaluation of the quality of
+the speech output with a relatively large group of naive listeners. The results of
+such an evaluation would no doubt prove useful in first establishing a benchmark
+level of performance for comparative purposes, as well as uncovering any
+problems in the current version of the system that might not have been detected
+earlier. In addition to obtaining measures of intelligibility of the speech output
+produced by the text-to-speech system, we were also interested in finding out how
+well naive listeners could comprehend continuous text produced by the system.
+This was thought to be an important aspect of the evaluation of the text-to-speech
+system as a whole, since a version of the current system might eventually be im-
+plemented as a device used for computer-aided instruction or as a functional
+reading machine for the blind (Allen, 1973). Both of these applications are now
+well within the realm of the available technology (Allen et al., 1979).
+
+In carrying out the evaluation of the system, we patterned several aspects of
+the testing after earlier work already completed on the evaluation of the Haskins
+Laboratories reading machine project so that some initial comparisons could be
+drawn between the two systems (Nye and Gaitenby, 1973, 1974). However, we
+also added several other tests to the evaluation to gain additional information about
+word recognition in normal sentential contexts and listening comprehension for a
+relatively wide range of narrative passages of continuous text. Data were also col-
+lected on reading comprehension for the same set of materials to permit direct
+comparison between the two input modalities. Traditional measures of listening or
+reading comprehension have not typically been obtained in previous evaluations of
+the quality of synthetic speech output, and therefore, we felt that some preliminary
+data would be quite useful before the major components of the present system
+were implemented as a workable text-to-speech device in an applied context.
+
+IThis chapter was written by D. Pisoni in 1978-9.
+
+151
diff --git a/pages-txt/164.txt b/pages-txt/164.txt
new file mode 100644
index 0000000..d6d5b7b
--- /dev/null
+++ b/pages-txt/164.txt
@@ -0,0 +1,47 @@
+From text to speech: The MITalk system
+
+In planning the current evaluation project, we also wanted to obtain infor-
+mation about several different aspects of the total system and their contribution to
+intelligibility and comprehension of speech. To meet this goal, a number of dif-
+ferent tests were selected to provide information about: 1) phoneme recognition, 2)
+word recognition in sentences, and 3) listening comprehension. It was assumed
+that the results of these three tests together would provide qualitative and quantita-
+tive information sufficient to identify any major problems in the operation of the
+total system at the time of testing in early May of 1979. The results of these three
+types of tests would also provide much more detailed information about the rela-
+tive contribution of several of the individual components of the system and their
+potential interaction.
+
+In carrying out these evaluation tests, we collected a total of 27,128 responses
+from some 160 naive listeners. A total of 45 minutes of synthetic speech was
+generated in fully automatic text-to-speech mode. No system errors were cor-
+rected at this time and no total system crashes were encountered during the genera-
+tion of the test materials used in the evaluation.
+
+13.2 Phoneme recognition
+
+After initial discussions, we decided to use the Modified Rhyme Test to measure
+the intelligibility of the speech produced by the system. This test was originally
+developed by Fairbanks (1958) and then later modified by House et al. (1965).
+This test was chosen primarily because it is reliable, shows little effect of learning,
+and is easy to administer to untrained and relatively naive listeners. It also uses
+standard orthographic responses, thereby eliminating problems associated with
+phonetic notation. Moreover, extensive data have already been collected with
+natural speech, as well as synthetic speech produced by the Haskins speech syn-
+thesizer (Nye and Gaitenby, 1973), therefore permitting us to make several direct
+comparisons of the acoustic-phonetic output of the two text-to-speech systems un-
+
+der somewhat comparable testing conditions.
+13.2.1 Method
+
+13.2.1.1 Subjects Seventy-two naive undergraduate students at Indiana Univer-
+sity in Bloomington served as paid listeners in this study. They were all recruited
+by means of an advertisement in the student newspaper and reported no history of
+a hearing or speech disorder at the time of testing. The subjects were all right-
+
+handed native speakers of English.
+
+13.2.1.2 Stimuli  Six lists of 50 monosyllabic words were prepared on the MIT
+text-to-speech system. The lists were recorded on audio tape via a Revox Model
+
+152
diff --git a/pages-txt/165.txt b/pages-txt/165.txt
new file mode 100644
index 0000000..78a391a
--- /dev/null
+++ b/pages-txt/165.txt
@@ -0,0 +1,49 @@
+Some measures of intelligibility and comprehension
+
+B77 tape recorder at 7.5 ips with a 3.0 second pause between successive items.
+Approximately half of the items in a given test list differed in the initial consonant
+
+while the remaining half differed in the final consonant.
+
+13.2.1.3 Procedure The seventy-two subjects were divided up into twelve inde-
+pendent groups containing six subjects each for testing. Two groups of subjects
+were assigned to each of the six original test lists. Subjects were told that this was
+a test dealing with isolated word recognition and that they were to indicate which
+word out of six possible alternatives was the one they heard on each trial. Forced-
+choice response forms were provided to subjects to record their judgements. Sub-
+jects were encouraged to guess if they were not sure, but to respond on each trial.
+No feedback was provided to subjects during the course of testing. Subjects were,
+however, explicitly informed that the test items were generated on a computer and
+that the experiment was designed to evaluate the intelligibility of the synthetic
+speech. An example of the test format is provided in Appendix D.
+
+Testing was carried out in a small experimental room in the Speech Percep-
+tion Laboratory in the Department of Psychology at Indiana University. This room
+is equipped with six individual cubicles. The audio tapes were reproduced on an
+Ampex AG-500 tape recorder and presented to subjects via TDH-39 matched and
+calibrated headphones at a comfortable listening level of about 80 dB SPL peak
+reading on a VTVM. A low-level (60 dB), broad-band (0-10 kHz) white noise
+source (Grason Stadler Model 1724) was also mixed with the speech to mask tape
+hiss, some nonstationary computer-generated background noise picked up during
+the recording at MIT, and any ambient noise in the local environment during test-
+ing.
+
+13.2.2 Results and discussion
+
+Although the Modified Rhyme Test employed real words, our interest was focused
+on the phoneme errors and resulting perceptual confusions. Overall performance
+on the test was very good with a total error rate, averaged across both initial- and
+final-syllable positions, of only 6.9 percent. Performance was somewhat better for
+consonants in initial position (4.6 percent errors) than final position (9.3 percent
+
+€ITOrS).
+The distribution of all errors across various manner classes is shown graphi-
+
+calfy in Figure 13-1 for initial- and final-syllable positions separately.
+
+Since the consonants comprising the various manner classes occurred with
+unequal frequencies in the Modified Rhyme Test, the observed error rates in the
+data may not be representative estimates of the intelligibility of the same
+phonemes in continuous speech. Nevertheless, performance is generally excellent
+
+153
diff --git a/pages-txt/166.txt b/pages-txt/166.txt
new file mode 100644
index 0000000..d0b848e
--- /dev/null
+++ b/pages-txt/166.txt
@@ -0,0 +1,41 @@
+From text to speech: The MITalk system
+
+[:I Initial position
+30 "/] Final position
+
+Percent errors
+
+AR RN
+
+?
+?
+z
+
+Stops Fricatives Nasals Affricates Approximants
+Manner class
+
+Figure 13-1: Average percent errors across various manner classes
+
+across almost all manner categories, except for the nasals in final position which
+showed an error rate of 27.6 percent. It should also be noted that while consonants
+in initial position were identified better than the same ones in final position, the
+relative distribution of the errors across syllable positions is not comparable, as
+shown in Figure 13-2 below.
+
+Figure 13-2 provides a detailed breakdown of the errors and the resulting con-
+fusions for consonants in initial and final positions. Each bar in the figure shows
+the total percent errors for a particular phoneme and the rank order of the most fre-
+quent confusions.
+
+In examining these data, it should be kept in mind that the error rates which
+make up the data shown in these two panels are quite low to begin with. The total
+percent errors were only 4.6 percent in initial position and 9.3 percent in final posi-
+tion. Inspection of this figure shows that, for the most part, the errors are
+predominantly confusions in place or manner of articulation. Errors in voicing,
+when they occurred, were substantially lower. The fricatives pE and TH show
+very high error rates when considered individually, although both of these
+phonemes occurred with a relatively low frequency in the test when compared with
+other consonants. The presence of the background masking noise may have con-
+tributed to the low performance levels observed with these weak fricatives. As
+
+154
diff --git a/pages-txt/167.txt b/pages-txt/167.txt
new file mode 100644
index 0000000..85242fa
--- /dev/null
+++ b/pages-txt/167.txt
@@ -0,0 +1,58 @@
+Some measures of intelligibility and comprehension
+
+0 10 20 30 40 50
+
+tHfeP
+S T 777
+
+Ke[rr  [ep
+
+c EEN 2 777,
+
+zilpp | Initial consonants
+BB
+
+SH[FF |
+
+7T PP |
+
+GG|TT
+
+e
+L
+
+oo Jrr  fee W
+
+eelee v Jes V7
+
+wWie oo |w [ne V77
+
+N 7777
+
+wite |-V
+
+22(sS
+
+0 10 20 30 40 50
+Percent error
+
+Phoneme presented
+
+Final consonants
+
+Figure 13-2: Distribution of errors and most frequent perceptual confusions
+
+noted above, the pattern of errors is quite different for consonants in initial and
+final positions. Such a finding is not unexpected given that different acoustic cues
+are used to synthesize the same phoneme in different environments.
+
+13.2.3 Conclusions
+
+For the most part, the intelligibility of the speech produced by the current version
+of the text-to-speech system is very high. The overall error rate of 6.9 percent is
+slightly lower than the error rate of 7.6 percent obtained in the earlier Haskins
+evaluation using the Modified Rhyme Test. The advantage of initial over final
+consonants observed in the present study is consistent with data obtained from
+natural speech by House et al. (1965), and Nye and Gaitenby (1973), although it
+
+155
diff --git a/pages-txt/168.txt b/pages-txt/168.txt
new file mode 100644
index 0000000..fa2785f
--- /dev/null
+++ b/pages-txt/168.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+differs slightly from the results found for the synthetic speech in the earlier Has-
+kins evaluation. In the Haskins study, error rates for the synthetic speech in initial
+and final positions were about the same with a very slight advantage for con-
+sonants in final position. The comparable overall error rates obtained for natural
+speech in the Modified Rhyme Test by House et al. and Nye and Gaitenby (1973)
+were 4 percent and 2.7 percent, respectively.
+
+In the earlier evaluation study, Nye and Gaitenby (1974) checked to ensure
+that the phonemic input to the Haskins synthesizer was correct. However, no cor-
+rections of any kind were made by hand in generating the present materials, either
+from entries in the morph lexicon or from spelling-to-sound rules. As discussed in
+the final section of this chapter, several different kinds of errors were uncovered in
+different modules as a result of generating such a large amount of synthetic speech
+through the system.
+
+Except for the high error rates observed for the nasals and fricatives in final
+syllable position, the synthesis of segmental information in the text-to-speech sys-
+tem appears to be excellent, at least as measured in a forced-choice format among
+minimal pairs of test items. With phoneme recognition performance as high as it
+is--nearly close to ceiling levels--it is difficult to pick up subtle details of the error
+patterns that might be useful in improving the quality of the output of the phonetic
+component of the system at the present time. In addition, the errors that were ob-
+served in the present tests might well be reduced substantially if the listeners had
+more experience with the speech output produced by the system. It is well known
+among investigators working with synthetic speech that rather substantial improve-
+ments in intelligibility can be observed when listeners become familiar with the
+quality of the synthesizer. Nye and Gaitenby (1974) as well as Carlson et al.
+(1976) have reported very sizeable learning effects in listening to synthetic speech.
+In the latter study, performance increased from 55 percent to 90 percent correct
+after the presentation of only 200 synthetic sentences over a two-week period. (See
+also the discussion of the word recognition and comprehension results below.)
+
+In summary, the results of the Modified Rhyme Test revealed very high levels
+of intelligibility of the speech output from the system using naive listeners as sub-
+jects. While the overall level of performance is somewhat lower than in previous
+studies employing natural speech, the level of performance for recognition of seg-
+mental information appears to be quite satisfactory for a wide range of text-to-
+speech applications at the present time.
+
+156
diff --git a/pages-txt/169.txt b/pages-txt/169.txt
new file mode 100644
index 0000000..c0aa494
--- /dev/null
+++ b/pages-txt/169.txt
@@ -0,0 +1,44 @@
+Some measures of intelligibility and comprehension
+
+13.3 Word recognition in sentences
+
+The results of the Modified Rhyme Test using isolated words indicated very high
+levels of intelligibility for the segmental output of the text-to-speech system.
+However, the Modified Rhyme Test employs a closed-response set involving a
+forced-choice format in what may be considered a relatively low uncertainty test-
+ing situation. In the recognition and comprehension of unrestricted text, a substan-
+tially broader range of alternatives is available to the listener since the response set
+is open and potentially infinite in size. Moreover, the sentential context itself
+provides an important contribution to intelligibility of speech, a fact that has been
+known for many years (Miller et al., 1951; Miller and Isard, 1963).
+
+To evaluate word recognition in sentence context, we decided to obtain two
+quite different sets of data. One set was collected using a small number of the Har-
+vard Psychoacoustic Sentences (Egan, 1948). These test sentences are all mean-
+ingful and contain a wide range of different syntactic constructions. In addition,
+the various segmental phonemes of English are represented in these sentences in
+accordance with their frequency of occurrence in the language. Thus, the results
+obtained with the Harvard sentences should provide a fairly good estimate of how
+well we might expect word recognition to proceed in sentences when both seman-
+tic and syntactic information is available to a listener. This situation could be con-
+sidered comparable, in some sense, to normal listening conditions where “top-
+down” knowledge interacts with sensory input in the recognition and comprehen-
+sion of speech (see Pisoni, 1978; Marslen-Wilson and Welsh, 1978).
+
+We also collected word recognition data with a set of syntactically normal but
+semantically anomalous sentences that were developed at Haskins Laboratories by
+Nye and Gaitenby (1974) for use in evaluating the intelligibility of their text-to-
+speech system (see also Ingeman, 1978). These test sentences permit a somewhat
+finer assessment of the availability and quality of “bottom-up” acoustic-phonetic
+information and its potential contribution to word recognition. Since the materials
+are all meaningless sentences, the individual words cannot be identified or
+predicted from knowledge of the sentential context or semantic interpretation.
+Thus, the results of these tests using the Haskins anomalous sentences should
+provide an estimate of the upper bound on the contribution of strictly phonetic in-
+formation to word recognition in sentence contexts. Since the response set is also
+open and essentially unrestricted, we would anticipate substantially lower levels of
+word recognition performance on this test than on the Harvard test; in the latter
+test, syntactic and semantic context is readily available and can be used freely by
+the listener at all levels of processing the speech input. In addition, the results of
+
+157
diff --git a/pages-txt/170.txt b/pages-txt/170.txt
new file mode 100644
index 0000000..af4a01f
--- /dev/null
+++ b/pages-txt/170.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+the anomalous sentence test can also be compared more or less directly to data col-
+lected with these same test sentences by Nye and Gaitenby (1974) and Ingeman
+(1978). Such comparisons should prove useful in identifying similarities and pos-
+sible differences in the speech output produced by the two text-to-speech systems.
+
+13.3.1 Method
+
+13.3.1.1 Subjects  Forty-four additional naive undergraduate students were
+recruited as paid subjects. They were drawn from the same population as the sub-
+jects used in the previous study and met the same requirements. None of these
+subjects had participated in the earlier study on phoneme recognition.
+
+13.3.1.2 Stimuli Two sets of test sentences were prepared. One set consisted of
+100 Harvard Psychoacoustic Sentences. Each sentence contained five key words
+that were scored as a measure of word recognition. The other set consisted of 100
+Haskins anomalous sentences drawn from the original list of materials developed
+by Nye and Gaitenby (1974). Each of these test sentences contained four key
+words. Two separate test lists were recorded on audio tape with a 3 second pause
+between successive sentences. The sentences were output at a speaking rate in ex-
+cess of 180 words per minute. As before, we did not correct any pronunciation
+errors. Examples of both types of test sentences are given in Appendixes E and F.
+
+13.3.1.3 Procedure = Twenty-one subjects received the Harvard sentences and
+twenty-three received the Haskins sentences. Testing was carried out in small
+groups of five or six subjects, each under the same listening conditions described
+in the previous study.
+
+Subjects in both groups were told that this study was concerned with word
+recognition in sentences and that their task was to write down each test sentence as
+they heard it in the appropriate location on their response sheets. They were told
+to respond on every trial and to guess if they were not sure of a word. For the
+Harvard sentences, the response forms were simply numbered sequentially with a
+continuous underlined blank space for each trial. However, since the syntactic
+structure of all of the Haskins sentences was identical, the response forms differed
+slightly: blank spaces were provided for the four key words. Determiners were
+printed in the appropriate locations in standard sentence frames.
+
+The experiment was run in a self-paced format to provide subjects with suf-
+ficient time to record their responses in the appropriate space in their booklets.
+However, subjects were encouraged to work rapidly in writing down their
+responses. The experimenter operated the tape recorder on playback from within
+the testing room by remote control. Thus, successive sentences in the test lists
+
+158
diff --git a/pages-txt/171.txt b/pages-txt/171.txt
new file mode 100644
index 0000000..081beb9
--- /dev/null
+++ b/pages-txt/171.txt
@@ -0,0 +1,47 @@
+Some measures of intelligibility and comprehension
+
+were presented only after all of the subjects in a group had finished responding to
+the previous test sentence, and had indicated this to the experimenter. A short
+break was taken halfway through a testing session, after completion of the first 50
+
+sentences.
+
+13.3.2 Results and discussion
+
+The responses were scored only for correct word recognition at this time. Phonetic
+errors, when they occurred, were not considered in the present analyses, although
+we expect to examine these in some detail at a later time. Each subject receiving
+the Harvard sentences provided a total of 500 responses, while each subject receiv-
+ing the Haskins anomalous sentences provided 400 responses to the final analysis.
+
+Performance on the Harvard sentences was quite good with an overall mean
+of 93.2 percent correct word recognition across all 21 subjects. The scores on this
+test ranged from a low of 80 percent to a high of 97 percent correct recognition.
+Of the 6.7 percent errors observed, 30.3 percent were omissions of complete
+words, while the remainder consisted of segmental errors involving substitutions,
+deletions, and transpositions. In no case, however, did subjects respond with per-
+missible nonwords that could occur as potential lexical items in English.
+
+As expected, word recognition performance on the Haskins anomalous sen-
+tences was substantially worse than the Harvard sentences, with a mean of 78.7
+percent correct recognition averaged over all 23 subjects. The scores on this test
+ranged from a low of 71 percent correct to a high of 85 percent correct. Of the
+21.3 percent errors recorded, only 11 percent were omissions of complete words.
+The difference in error patterns, particularly in terms of the number of omissions,
+between the two types of sentence contexts suggests a substantial difference in the
+subjects’ perceptual strategies in the two tests. It seems quite likely that subjects
+used a much looser criterion for word recognition with the Haskins anomalous sen-
+tences simply because the number of permissible alternatives was substantially
+greater than those in the Harvard sentences. Moreover, the presence of one stan-
+dard syntactic structure probably encouraged subjects to guess more often when
+the acoustic cues to word identification were minimal. In addition, there seemed to
+be evidence of semantically based intrusions in the recall data, suggesting that sub-
+jects were attempting to assign an interpretation to the input signal even though
+they knew beforehand that all of the sentences were meaningless.
+
+As noted earlier, substantial learning effects occur with synthetic speech.
+Even after an initial period of exposure, recognition performance continues to im-
+prove. Comparisons of word recognition performance in the first and second half
+of each of the tests indicated the presence of a reliable learning effect. For both the
+
+159
diff --git a/pages-txt/172.txt b/pages-txt/172.txt
new file mode 100644
index 0000000..1d9ed4e
--- /dev/null
+++ b/pages-txt/172.txt
@@ -0,0 +1,44 @@
+From text to speech: The MITalk system
+
+Harvard and Haskins sentences, performance improved on the second half of the
+test relative to the first half. Although the differences were small, amounting to
+only about 2 percent improvement in each case, the result was very reliable (p <
+.01) across subjects in both cases.
+
+The performance levels obtained with the Haskins semantically anomalous
+sentences are very similar to those reported earlier by Nye and Gaitenby (1974),
+and more recently by Ingeman (1978) using the same sentences with the Haskins
+synthesizer and text-to-speech system. Nye and Gaitenby (1974) reported an
+average error rate of 22 percent for synthetic speech and five percent for com-
+parable natural speech. However, Nye and Gaitenby used both naive and ex-
+perienced listeners as subjects, and found rather large differences in performance
+between the two groups, as we noted above. This result is presumably due to
+familiarity and practice listening to the output of the synthesizer. We suspect that
+if the experienced subjects were eliminated from the Nye and Gaitenby analyses,
+performance would be lower than the original value reported and would therefore
+differ somewhat more from the present findings. Nevertheless, the error rate for
+these anomalous sentences produced with natural speech is still lower than the cor-
+responding synthetic versions, although it is not clear at the present time how
+much of the difference could be due to listener familiarity with the quality of the
+synthetic speech.
+
+13.3.3 Conclusions
+
+The results of the two word-recognition tests indicate moderate to excellent levels
+of performance with naive listeners depending on the particular test format used
+and the type of information available to the subject. In one sense, the results of
+these two tests can be thought of as approximations to upper and lower bounds on
+the accuracy of word-recognition performance with the current text-to-speech sys-
+tem. On the one hand, the Harvard test sentences provide some indication of how
+word recognition might proceed when both semantic and syntactic information is
+available to a listener under normal conditions. On the other hand, the Haskins
+anomalous sentences direct the subjects’ attention specifically to the perceptual in-
+put and therefore provide a rough estimate of the quality of the acoustic-phonetic
+information and sentence analysis routines available for word recognition in the
+absence of contextual constraints. Of course, in normal listening situations, and
+presumably in cases where a text-to-speech system such as the present one might
+be implemented, the complete neutralization of such contextual effects on intel-
+ligibility would be extremely unlikely. Nevertheless, a more detailed analysis of
+the word-recognition errors in the Haskins anomalous sentence test might provide
+
+160
diff --git a/pages-txt/173.txt b/pages-txt/173.txt
new file mode 100644
index 0000000..14cae80
--- /dev/null
+++ b/pages-txt/173.txt
@@ -0,0 +1,44 @@
+Some measures of intelligibility and comprehension
+
+us with additional information that could be used to modify or improve several of
+the modules of the system. Whether such additional improvements at these
+various levels of the system will actually contribute to improved intelligibility and
+comprehension is difficult to assess at this time, since performance with meaning-
+ful sentences is already quite high to begin with, as shown by the present results
+obtained with the Harvard sentences.
+
+In summary, the results of tests designed to measure word recognition in two
+types of sentential context showed moderate to excellent levels of performance
+with synthetic speech output from the current version of the text-to-speech system.
+As in the previous section dealing with the evaluation of the intelligibility of iso-
+lated words, the present results, particularly with rather diverse meaningful sen-
+tences, suggest that the quality of the speech output at the present time is probably
+quite satisfactory for a relatively wide range of applications requiring the process-
+ing of unrestricted text. While there is room for improvement in the quality of the
+output from various modules of the system, as suggested by the results of the Has-
+kins anomalous sentences, it is not apparent whether the allocation of resources to
+effect such changes in the system would produce any detectable differences. Dif-
+ferences that might be detected, if any, might well require a very restricted listen-
+ing environment in which all of the higher-level syntactic and semantic infor-
+mation is eliminated, a situation that is unlikely to occur when the system is imple-
+mented in an applied setting. Given these results on word recognition, however, it
+still remains to be determined how well listeners can understand and comprehend
+continuous speech produced by the system, a problem we turn to in the next sec-
+tion of this chapter.
+
+13.4 Comprehension
+Research on comprehension and understanding of spoken language has received a
+great deal of attention by numerous investigators in recent years. It is generally
+agreed that comprehension is a complex cognitive process, initially involving the
+input and subsequent encoding of sensory information, the retrieval of previously
+stored knowledge from long-term memory, and the subsequent interpretation, in-
+tegration or assimilation of various sources of knowledge that might be available
+to a listener at the time. Comprehension, therefore, depends on a relatively large
+number of diverse factors, some of which are still only poorly understood at the
+present time. Measuring comprehension is difficult because of the interaction of
+many of these factors and the absence of any coherent model that is broad enough
+to deal with the diverse nature of language understanding.
+
+One of the factors that obviously plays an important role in listening com-
+
+161
diff --git a/pages-txt/174.txt b/pages-txt/174.txt
new file mode 100644
index 0000000..aaba869
--- /dev/null
+++ b/pages-txt/174.txt
@@ -0,0 +1,45 @@
+From text to speech: The MITalk system
+
+prehension is the quality of the input signal expressed in terms of its overall intel-
+ligibility. But as we have seen even from the results summarized in the previous
+sections, additional consideration must also be given to the contribution of higher-
+level sources of knowledge to recognition and comprehension. In this last section,
+we wanted to obtain some preliminary estimate of how well listeners could com-
+prehend continuous text produced by the text-to-speech system. Previous evalua-
+tions of synthetic speech output have been concerned primarily with measuring in-
+telligibility or listener preferences with little if any concern for assessing com-
+prehension or understanding of the content of the materials (Nye et al., 1975).
+Indeed, as far as we have been able to determine, no previous formal tests of the
+comprehension of continuous synthetic speech have ever been carried out with a
+relatively wide range of textual materials specifically designed to assess under-
+standing of the content rather than form of the speech.
+
+To accomplish this goal, we selected fifteen narrative passages and an ap-
+propriate set of test questions from several standardized adult reading comprehen-
+sion tests. The passages were quite diverse, covering a wide range of topics, writ-
+ing styles and vocabulary. We thought that a large number of passages would be
+interesting to listen to in the context of tests designed to assess comprehension and
+understanding. Since these test passages were selected from several different types
+of reading tests, they also varied in difficulty and style, permitting us to evaluate
+the contribution of all of the individual modules of the text-to-speech system in
+terms of one relatively gross measure.
+
+In addition to securing measures of listening comprehension for these pas-
+sages, we also collected a parallel set of data on reading comprehension of these
+materials from a second group of subjects. The subjects in the reading comprehen-
+sion group answered the same questions after reading each passage silently, as did
+subjects in the listening comprehensiﬂon group. This condition was included in or-
+der to permit comparison between the two input modalities. It was assumed that
+the results of these comprehension tests would therefore provide an initial, al-
+though preliminary, benchmark against which the entire text-to-speech system
+could be evaluated with materials somewhat comparable to those used in the im-
+mediate future.
+
+13.4.1 Method
+
+13.4.1.1 Subjects  Forty-four additional naive undergraduate students were
+recruited as paid subjects. They were drawn from the same source as the subjects
+used in the previous studies. Some of the subjects assigned to the reading com-
+prehension group had participated in the earlier study using the Modified Rhyme
+
+162
diff --git a/pages-txt/175.txt b/pages-txt/175.txt
new file mode 100644
index 0000000..e6c03e8
--- /dev/null
+++ b/pages-txt/175.txt
@@ -0,0 +1,52 @@
+Some measures of intelligibility and comprehension
+
+Test. However, none of the subjects in the listening comprehension group had
+been in any of the prior intelligibility or word-recognition tests using synthetic
+speech.
+
+13.4.1.2 Stimuli Fifteen narrative passages were chosen more or less randomly
+from several published adult reading comprehension tests. The exact details of the
+passages and their original sources are provided in Table 13-1 below. An example
+of one of the passages is provided in Appendix G.
+
+Table 13-1: Characteristics of the passages used to measure comprehension
+Number
+
+Passage Number Duration of test General topic Source
+of words (s)  questions
+
+1 212 75 6 lens buying Coop English
+
+2 159 56 4  measuring distance Coop English
+to nearby stars
+
+3 327 135 8  language Iowa
+
+4 198 75 4  retail institutions Nelson-Denny
+
+5 175 70 4  noise pollution Nelson-Denny
+
+6 204 82 4  geology Nelson-Denny
+
+7 206 68 4  philosophy Nelson-Denny
+
+8 207 80 4  radioactive dating  Nelson-Denny
+
+9 292 117 8 history Iowa
+
+10 315 100 9 sea Towa
+
+11 265 101 7  New Mexico Stanford
+
+12 322 125 6  fox hunting Stanford
+
+13 253 98 6  Claude Debussy Stanford
+
+14 267 107 7  Aluminum Stanford
+
+15 212 82 6  Roger Bannister Stanford
+
+Each passage was initially typed in orthographic form with punctuation into a
+text file. These files were then used as input to the text-to-speech system and as a
+
+163
diff --git a/pages-txt/176.txt b/pages-txt/176.txt
new file mode 100644
index 0000000..9a21d8c
--- /dev/null
+++ b/pages-txt/176.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+source for preparing the typed versions of the passages used in the reading com-
+prehension condition. All fifteen passages were recorded on audio tape at a speak-
+ing rate in excess of 180 words/minute for later playback. Two sets of response
+booklets were prepared, one for the listening group and one for the reading group.
+The booklets, which contained a varying number of multiple-choice questions
+keyed to each paragraph, were arranged in order according to the presentation
+schedule of the paragraphs on the audio tape. The booklets for subjects in the
+reading group also included a typed copy of the passage immediately before the
+appropriate set of questions. Appendix G also provides the set of questions cor-
+responding to the passage.
+
+13.4.1.3 Procedure Half of the forty-four subjects were assigned to the listening
+group and the other half to the reading group. Subjects assigned to the reading
+group were tested together in a classroom, while the subjects in the listening group
+were tested in small groups of five or six subjects each using the listening facilities
+of the previous studies. These subjects wore headphones and listened to the pas-
+sages under the same conditions as the earlier subjects.
+
+Instructions to the subjects in both groups emphasized that the purpose of the
+study was to evaluate how well individuals could comprehend and understand con-
+tinuous synthetic speech produced by a reading machine. Subjects in the listening
+group were told that they would hear narrative passages about a wide variety of
+topics and that their task was to answer the multiple-choice questions that were
+keyed to the particular passages as best as they could based on the information
+contained in the passages they heard. Similar instructions were provided to the
+reading comprehension group.
+
+As in the previous word-recognition study, the listening comprehension group
+was presented with test passages in a self-paced format with the experimenter
+present in the testing room operating the tape recorder via remote control. A given
+test passage was presented only once for listening, after which, subjects im-
+mediately turned their booklets to the appropriate set of test questions.
+
+The subjects in the reading comprehension group were permitted to read each
+passage only once and were explicitly told that they should not go back over the
+passage after reading it or while answering the questions. This procedure was a
+departure from the typical methods used in administering standardized reading
+comprehension tests. Usually, the test passage is available to the subject for in-
+spection and re-reading during the entire testing session. However, for present
+purposes, we felt that comparisons between reading and listening comprehension
+might be more closely matched by limiting exposure to one pass through the
+materials.
+
+164
diff --git a/pages-txt/177.txt b/pages-txt/177.txt
new file mode 100644
index 0000000..f4caa93
--- /dev/null
+++ b/pages-txt/177.txt
@@ -0,0 +1,57 @@
+Some measures of intelligibility and comprehension
+
+The subjects in both groups were told at the beginning of testing that the first
+
+two passages of the test and the accompanying questions were only for practice to
+familiarize them with the materials and nature of the test format. These two pas-
+sages were not scored in the final analyses reported here.
+
+13.4.2 Results and discussion
+
+The multiple-choice questions for each of the thirteen test passages were scored
+separately for each subject. A composite score was then obtained by simply
+cumulating the individual scores for each passage and then expressing this value as
+a percentage of the total possible score across all of the passages.
+
+The overall results for both reading and listening comprehension are shown in
+Figure 13-3 summed over all thirteen test passages. The data are also broken down
+in this figure by first and second half of the test.
+
+The average percent correct was 77.2 percent for subjects in the reading com-
+prehension group and 70.3 percent for subjects in the listening comprehension
+group. The 7 percent difference between these two means is small, but statistically
+significant by a t-test for independent groups (p < .05).
+
+100
+90 Reading Listening
+(N=22) (N=22)
+8
+§ 80
+S
+€ oo’ <o
+@ 5 R3S
+S 70 R R
+o O OO
+Lo Lo’
+o Lo Toted
+5S¢ / 00’
+60 R Y
+Lo Loe ¥
+Lode% Loe%
+Lo Pe%e¥
+Lot ¥ Lo’
+50 Lot o/ ARKL
+Total 1St 2nd Total 18t 2nd
+score half score half
+
+Figure 13-3: Percent correct comprehension scores for reading and listening
+groups
+
+Although the reading comprehension group showed better performance over-
+all when compared with the listening comprehension group, a breakdown of the
+comprehension scores for the two halves of the test showed a significant (p < .001)
+improvement in performance only for the subjects in the listening comprehension
+condition. There were no differences between first and second halves of the test
+for subjects in the reading comprehension group. The finding of improved perfor-
+
+165
diff --git a/pages-txt/178.txt b/pages-txt/178.txt
new file mode 100644
index 0000000..af52528
--- /dev/null
+++ b/pages-txt/178.txt
@@ -0,0 +1,44 @@
+From text to speech: The MITalk system
+
+mance in the second half of the test for subjects in the listening group is consistent
+with our earlier observations in the word-recognition tests which show that listen-
+ing performance improves for synthetic speech after only a short period of ex-
+posure. When the two comprehension groups are compared on the same passages
+in the last half of the test, their performance is equivalent (p > .05), which suggests
+that the overall difference between the two groups is probably due to familiarity
+with the output of the synthesizer and not due to any inherent difference in the
+basic strategies used in comprehending or understanding the content of these pas-
+sages. This conclusion is strengthened even further by the fact that the thirteen
+passages are correlated across both testing conditions. In this case, a very high
+correlation (r = +.97) was observed between reading and listening comprehension
+scores for individual passages. Passages that are difficult to comprehend when
+read are also difficult to comprehend when listened to, and vice versa. The time
+taken to complete all passages in both tests was, however, roughly the same, last-
+ing between 45 and 50 minutes.
+
+After the listening comprehension test was completed, we solicited additional
+subjective evaluations of the speech produced by the synthesizer and the nature of
+the comprehension test itself. Twenty of the twenty-two subjects indicated that
+they were able to comprehend and understand the content of the passages “well” or
+“very well”. Only two of the subjects reported difficulty in comprehension, and
+even these two did not indicate that they were merely guessing, an available
+response alternative.
+
+Several of the subjects reported improved ability to understand the speech as
+testing progressed. Others described several problems in the quality of synthesis,
+the location of pauses, the existence of inappropriately stressed words, and the oc-
+casional presence of very long “run-on” sentences in several passages. Finally,
+several other subjects suggested that each test passage should be presented twice
+so they could review some of the specific details and facts that were stated ex-
+plicitly. For the most part, however, the subjects found listening to the speech in-
+teresting and felt that they had performed reasonably well in comprehending the
+passages. None of the subjects reported any major distractions in the quality of the
+synthetic speech that interfered with their ability to attend to or understand the con-
+tent of the passages. Thus, subjects are able to adapt easily to relatively long pas-
+sages of synthetic speech with little exposure or practice.
+
+13.4.3 Conclusions
+The results of the comprehension test indicate that naive subjects are able to com-
+prehend synthetically produced spoken passages of narrative text output from an
+
+166
diff --git a/pages-txt/179.txt b/pages-txt/179.txt
new file mode 100644
index 0000000..39c6e84
--- /dev/null
+++ b/pages-txt/179.txt
@@ -0,0 +1,45 @@
+Some measures of intelligibility and comprehension
+
+unrestricted text-to-speech system. Their performance is roughly comparable to
+subjects who have been asked to read the same passages of text and answer the
+same questions. As in the case of our other tests using synthetic speech, there ap-
+pears to be an initial period during which subjects are simply becoming familiar
+with the quality of the synthesizer, the prosodic rules of the system and the style of
+the material. Even after only a few minutes of exposure, comprehension perfor-
+mance improves substantially and eventually approximates levels observed when
+subjects read the same passages of text.
+
+It should also be pointed out that the comprehension performance observed in
+these tests was obtained with a reading rate in excess of 180 words per minute.
+This rate is about the rate at which people typically speak in normal conversations
+or when they read text aloud. The present results therefore suggest that it is not
+necessary to slow down the speaking rate or adjust the synthesis to obtain rela-
+tively high levels of listening comprehension for continuous text. Until the present
+tests were carried out, it was assumed by some investigators that synthetic speech
+had to be output at a much slower rate to maintain intelligibility and therefore
+facilitate comprehension.
+
+Based on the results of the present comprehension test, as well as the other
+tests of intelligibility and word recognition that were carried out, there is good
+reason to believe that the basic design of the MIT text-to-speech system is valid.
+The system can not only produce highly intelligible synthetic speech, as shown in
+our earlier tests, but the quality of the synthetic speech can be understood and com-
+prehended at reasonably high levels. While there are, no doubt, many subtle
+details of the system that might be improved, the results of these preliminary tests
+support the general conclusion that very high-quality synthetic speech can be
+produced automatically from unrestricted text and that such a system could be im-
+plemented in applied settings in the immediate future. After some thirty years of
+research, the widespread use of text-to-speech and voice response systems in com-
+puter aided instruction and as aids for the handicapped is now a realistic goal. The
+obstacles are no longer questions of research into the basic principles of speech
+production, perception, and linguistic analysis, but are simply the practical matters
+of implementation and economics.
+
+13.5 General discussion and conclusions
+
+The results of the three tests designed to evaluate intelligibility, word recognition,
+and listening comprehension indicated very high levels of performance for the cur-
+rent version of the text-to-speech system. While these tests are only preliminary,
+they have provided an initial benchmark against which to compare the perfor-
+
+167
diff --git a/pages-txt/180.txt b/pages-txt/180.txt
new file mode 100644
index 0000000..acd5434
--- /dev/null
+++ b/pages-txt/180.txt
@@ -0,0 +1,45 @@
+From text to speech: The MITalk system
+
+mance of the present system with other text-to-speech systems. Moreover, the
+present results have provided a basis for evaluating the overall design of the sys-
+tem and the functioning of several of the individual components. Since a relatively
+large amount of text was specifically generated for this project, we were able to
+identify a number of errors in the operation of the system which ordinarily might
+not have been detected. In this last section of the chapter, we summarize briefly a
+few of the errors we were able to uncover during and after the evaluation. We will
+also point out some of the limitations of the current evaluation results and then dis-
+cuss several directions for additional testing in the future.
+
+After the test materials for the evaluation project were generated, it was pos-
+sible to go back and examine the output of each module individually in order to
+determine whether it provided a correct analysis of the input text. Errors of
+various kinds in the final spoken output could originate at several different
+modaules in the system. In addition, there could be errors resulting from transcrip-
+tion that we would not associate with the operation of the text-to-speech system
+itself.
+
+Of all the errors observed, we discovered only one that could legitimately be
+classified as a transcription error. In this case, the word “harmonies” was incor-
+rectly typed into the system as “harmonics” and was not detected in subsequent
+proofreading. All remaining errors could be located at one or more modules of the
+system. These errors consisted of incorrect parsings, pronunciations, or stress as-
+signments. An error located at one module often affected analyses carried out by
+other modules. Sometimes the results of these errors were quite noticeable in the
+spoken output, particularly when the errors produced segmental distinctions that
+could be detected in pronunciation. However, in other cases, particularly where
+stress assignment was involved, the differences were more difficult to detect.
+
+At the time this report was coﬁ{pleted, we were able to locate only two errors
+in the operation of the first module of the system. This module (FORMAT) has a
+dictionary that converts abbreviations, symbols, and numbers to words for sub-
+sequent processing. One error involved the abbreviation “U.S.” in which a space
+was incorrectly typed between “U.” and “S.” The rule which was applied here
+places an end-of-sentence period in the output if an abbreviatory period (as in
+“U.”) is followed by one or more spaces and a capital letter (the “S”). Thus, two
+sentences were formed, one ending in “U.” and the other beginning with “S.” This
+error causes an incorrect pitch contour to be placed on the output, as well as in-
+appropriate segmental durations to be assigned in later modules.
+
+Another error involved the abbreviation “19th”. In all cases, alphanumerics
+
+168
diff --git a/pages-txt/181.txt b/pages-txt/181.txt
new file mode 100644
index 0000000..a54d5ed
--- /dev/null
+++ b/pages-txt/181.txt
@@ -0,0 +1,48 @@
+Some measures of intelligibility and comprehension
+
+are spelled out completely by this module. For example, “19th” was pronounced
+as “one-nine-T-H” on output. In words such as “19th” or “100-yard”, the al-
+phabetic and numeric sections are separable and could be pronounced. However,
+in a true alphanumeric such as “103S” or “a3c”, it is correct to spell out all of the
+symbols.
+
+A number of errors were also detected in the module DECOMP, which is
+responsible for decomposing words into morphs by reference to the morph lexicon.
+In several cases, the wrong morphs were identified, resulting in perceptible seg-
+mental errors in the speech output. In other cases, the correct morphs were ob-
+tained, but the stress assignment of the constituent morphs was different for the
+morphs in isolation than for the morphs when concatenated in a polymorphemic
+word. We also identified several words that should have been in the lexicon since
+their pronunciation could not be handled by the existing spelling-to-sound rules.
+
+Several errors in the operation of the spelling-to-sound rules were also
+detected. These errors resulted in the wrong pronunciation, which was quite
+noticeable in listening. For example, the second syllable of the word “Britain” was
+pronounced like the second syllable in the word “maintain™.
+
+In a number of other cases, we were able to identify problems in the operation
+of the parser, particularly in recognizing the correct part of speech. For example,
+the word “close” can be either an adjective or verb, each with a different pronun-
+ciation. Several problems were also observed with the word “affect”, which can be
+either a noun or a verb. In each of these cases, the part of speech was incorrectly
+identified by the parser, resulting in the wrong choice in pronunciation on output.
+
+Finally, there were several cases, especially with the Haskins anomalous sen-
+tences, in which the parser incorrectly assigned the verb (which could also be a
+noun) to the previous noun phrase. This error is not surprising since the parser has
+a basic preference for noun phrases ényway, when a choice is available. However,
+this often produced inappropriate sentence stress resulting from incorrect pitch and
+
+segmental durations. In some cases, these differences could be readily observed,
+whereas in others, the effects were substantially more difficult to detect even with
+
+careful and repeated listening. These observations are consistent with an earlier
+perceptual study of the durational rules carried out by Carlson et al. (1979). They
+found that a deletion of a phrase boundary produced only negligible effects on
+listeners’ evaluations of the naturalness of synthetic speech.
+
+Some of the errors described above are considered to be relatively minor and
+can be corrected rather easily by the simple addition of polymorphemic entries in
+the morph lexicon. Since this evaluation was completed, a “pre-parser” has been
+
+169
diff --git a/pages-txt/182.txt b/pages-txt/182.txt
new file mode 100644
index 0000000..916dcc1
--- /dev/null
+++ b/pages-txt/182.txt
@@ -0,0 +1,44 @@
+From text to speech: The MITalk system
+
+implemented which corrects a number of the parsing errors in which the sentential
+verb was included in the preceding noun phrase. However, some of the other pars-
+ing errors are not as easy to correct. Errors made by the first module and the
+spelling-to-sound rules are highly context-dependent, and are not easily amenable
+to simple change by rule. From our examination of the errors uncovered so far, all
+cases could be accounted for and located in some module of the system. There
+were no errors detected which escaped explanation at the present time, although
+further study is continuing.
+
+The results of the present evaluation study have several limitations and these
+should be summarized here briefly for future reference. First, we did not carry out
+any of the control conditions for the three types of tests. using natural speech. To
+some extent, this might be considered an important addition and extension of the
+current evaluation since it is the level of performance with natural speech that is
+frequently used as the yardstick against which to compare the quality of synthetic
+speech. There can be little doubt that tests with natural speech would show higher
+levels of performance when compared with synthetic speech. But it should be em-
+phasized here that the levels of performance in the current study are already quite
+high to begin with, therefore it is not immediately obvious what would be gained
+from such additional tests with natural speech.
+
+Secondly, with regard to measuring intelligibility of the segmental output, it is
+clear that the Modified Rhyme Test is much too easy for listeners, even naive lis-
+teners, and additional tests using an open-response set should be employed. Ad-
+ditional testing under varying noise conditions may also provide further infor-
+mation concerning the quality of the synthesis and its resistance to noise and dis-
+tortion. In this regard, the analysis of the Haskins anomalous sentences should
+also provide a rich source of data on phonetic confusions using an open-response
+set. We are planning additional detailed analyses of these data.
+
+Finally, the comprehension test used was relatively gross in its ability to dis-
+tinguish between new knowledge acquired from listening to text and knowledge
+obtained from inferences drawn at the time of comprehension or, later, at the time
+of testing. Of course, this is a problem related more to several broader issues in
+language comprehension and understanding than to questions surrounding text-to-
+speech and speech synthesis-by-rule. Nevertheless, it may be possible to learn a
+great deal more about language comprehension and the interaction between top-
+down and bottom-up knowledge sources in speech perception by the advances that
+have been made in conceptualizing various linguistic problems within the context
+of a functional text-to-speech system. The success of the current system and its
+
+170
diff --git a/pages-txt/183.txt b/pages-txt/183.txt
new file mode 100644
index 0000000..0d87fc8
--- /dev/null
+++ b/pages-txt/183.txt
@@ -0,0 +1,17 @@
+Some measures of intelligibility and comprehension
+
+capabilities to process unrestricted text must be traced, at least in part, to the exist-
+ence of an explicit model of the underlying linguistic structure that is common to
+both text and speech and to the rule systems relating the two domains.
+
+In summary, the results of our evaluation tests designed to measure phoneme
+intelligibility, word recognition and comprehension of synthetic speech produced
+by the MIT text-to-speech system have demonstrated good to excellent perfor-
+mance on a wide range of materials. No major problems were uncovered in the
+design of the system nor were any serious errors identified in any of the com-
+ponent modules of the system to date. The present results, although preliminary,
+support the general conclusion that very high-quality synthetic speech can be
+produced automatically from unrestricted English text and that such a system could
+be implemented in an applied setting in the very near future.
+
+171
diff --git a/pages-txt/184.txt b/pages-txt/184.txt
new file mode 100644
index 0000000..f55f21c
--- /dev/null
+++ b/pages-txt/184.txt
@@ -0,0 +1,40 @@
+14
+
+Implementation
+
+14.1 Conceptual organization
+
+Throughout this book, emphasis has been placed on the representation of various
+data forms and rules, together with transformations between these representations.
+A strong effort has been made to exclude all reference to implementation concerns
+from these discussions. At this point, however, it is appropriate to address these
+issues, thus giving a view of the conceptual framework in which this research was
+done, as well as a perspective on economically viable implementations that can’
+deliver the overall text-to-speech capability in real-time. With these goals in mind,
+we discuss first the overall conceptual organization of the MITalk system, fol-
+lowed by a description of the development system used as a research vehicle over
+the course of a dozen years, the requirements for a “performance system” suitable
+for practical applications, and finally, a discussion of the current system, together
+with examples, which serves as the basis for distribution of the MITalk system
+from MIT.
+
+The overall conceptual organization of the MITalk system can be viewed on
+two levels. At the highest level, the system is viewed as an analysis/synthesis sys-
+tem. Itis based on the premise that in order to transform an input textual represen-
+tation (as a string of ASCII characters) to an output synthesized speech waveform,
+it is necessary to first analyze the text into an underlying abstract linguistic
+representation which can then be used as the initial basis for synthesizing the
+waveform. In this sense, the text and speech waveform representations are seen as
+two different surface representations of a common, underlying linguistic represen-
+tation which unites these two surface forms. Thus, the first part of the system is
+oriented to transforming the input textual representation into a narrow phonetic
+transcription which includes the names of the constituent phonemes, stress marks,
+and syntactic boundaries at the syllable, morph, word, phrase, and sentence levels.
+It is an implicit assumption of the system that this transcription is sufficient to
+serve as the input for the synthesis routines which generate the timing framework,
+the pitch contour, and the detailed control parameters (updated at 5 msec intervals)
+which specify the nature of the vocal tract model, which in turn produces the final
+
+output synthetic speech waveform.
+
+172
diff --git a/pages-txt/185.txt b/pages-txt/185.txt
new file mode 100644
index 0000000..7ef052e
--- /dev/null
+++ b/pages-txt/185.txt
@@ -0,0 +1,46 @@
+Implementation
+
+At a more detailed level, the analysis and synthesis phases of the MITalk sys-
+tem have been broken into a set of modules with well-defined interfaces at their
+boundaries. In this way, it is possible to break up the overall transformation
+process into well-specified but smaller transformations which serve to reduce the
+overall complexity of the system, and to provide the means for focusing on dif-
+ferent aspects and different representations within the system. Since the system 1s
+oriented to pass information forward from module to module, using temporary data
+bases as well, it is possible to have well-formed boundaries and to test and evaluate
+each module on a module-by-module basis. This is an important consideration
+since quality measures can be assigned to each module, and local optimization of
+the modules can be expected to incrementally contribute to the overall quality of
+the output synthetic speech. Of course, the degree to which various refinements
+within the modules increase measures of intelligibility and naturalness at the out-
+put will vary substantially, but there is little need to consider the detailed way in
+which the results of several modules are integrated when one is working on im-
+provements for a particular module. In this way, it has been possible to develop the
+individual modules of the system in parallel, while providing an overall system
+framework that permits separate module development. By keeping the interfaces
+and data base formats carefully specified, it is possible to pinpoint deficiencies in
+the system at various levels, and in this way to provide guidance for the allocation
+of research effort to the various parts of the system. In retrospect, this modular
+approach to the overall representation and development of the system has served
+very well, and continues to be the major framework in which further improvements
+
+are being made.
+
+14.2 Development system
+
+Given the rapid improvements in both software and hardware technology, it is not
+very useful to describe in detail the earlier development environments which are
+relevant to the 1960s and the 1970s. But it is useful to have an idea of the evolu-
+tion of the computational research framework. Initially, a small single-user mini-
+computer was used with an analog hardware synthesizer. While some code was
+written in assembly language as the work evolved, particularly with the introduc-
+tion of the modular organization, most of the symbolic computing was done in a
+variant of the BCPL language, while the phonemic synthesis was done in
+FORTRAN. During the 1970s, a large time-shared machine was introduced with a
+
+special purpose hardware vocal tract model (Miranker, 1978). This system was
+exceedingly useful since it provided the framework for several researchers to ef-
+
+fectively collaborate and to share information in an ongoing, highly interactive
+
+173
diff --git a/pages-txt/186.txt b/pages-txt/186.txt
new file mode 100644
index 0000000..17b1bca
--- /dev/null
+++ b/pages-txt/186.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+way. Most recently, the entire system has been converted to run under UNIX,
+written mainly in PASCAL with some routines in C. This is a highly flexible sys-
+tem, and introduces a new overall control program which allows various subsets of
+the system to be effectively utilized. There is also an ability to monitor the system
+at several levels of detail, thus providing the user with substantial insight into the
+overall workings of the system. This version of the system, which is the basis for
+current distribution from MIT, is described in detail later in this chapter.
+
+14.3 Performance system
+
+The structure of the development system, even in its contemporary UNIX im-
+plementation, is not suitable for compact, real-time, and economical utilization in
+practical contexts. For such uses, less flexibility is required, and special purpose
+hardware is necessary. For example, the lexicons and rule bases can be stored in
+high-density memory without the necessity for utilizing electromechanical disks.
+A general purpose microprocessor can be utilized to provide overall system control
+and to provide the linguistic analysis and prosodic synthesis up to the level of the
+phonemic synthesis conversion to the output speech waveform. Finally, a signal
+processing chip can perform all of the phonemic synthesis to waveform conversion
+in real-time, thus meeting the overall requirements of a practical, high-
+performance system. Current commercial systems, many of which are based on
+the licensing of the MITalk system, readily provide this capability. It is important
+to emphasize that there are no significant hardware limitations to the real-time and
+economic usage of the entire span of MITalk algorithms. In the past, concerns
+were expressed about the size of the lexicon and the real-time signal processing
+requirements, but these requirements pose no difficulties for modern technology.
+
+In the future, one can conceive of the entire MITalk system implemented on a
+single integrated-circuit wafer, or in a small set of chips. In this way, ASCII
+characters can be converted to output speech waveforms in many different en-
+vironments, including highly compact terminals. While a wafer-scale system must
+be viewed as highly aggressive technology in the mid-1980s, there is no inherent
+difficulty in achieving such a system. There is no question that highly complex
+and capable text-to-speech systems will be available in such compact formats in
+the near future.
+
+14.4 UNIX implementation
+
+As mentioned above, the present version of the development system consists of a
+set of PASCAL and C programs which run in a UNIX operating system environ-
+ment. There is one program per speech processing module described in previous
+chapters. In addition, there is a coordinator program which serves as the user-
+
+174
diff --git a/pages-txt/187.txt b/pages-txt/187.txt
new file mode 100644
index 0000000..aec2417
--- /dev/null
+++ b/pages-txt/187.txt
@@ -0,0 +1,37 @@
+Implementation
+
+interface to the rest of the system and a “wiretap” program which translates binary
+program output into human-readable form.
+
+The speech processing modules all share a common interface configuration.
+Each module has an input port and an output port (these are the UNIX standard
+input and standard output channels, respectively). Module input can come from
+either a disk file or from the output port of the preceding module (via a UNIX
+pipe). Module output can be directed to a disk file or to the input port of the next
+module. A group of modules connected together in sequence is called a pipeline in
+UNIX terminology.
+
+The top level program, called MITALK, handles the creation of a speech
+processing pipeline. The user can call for the entire pipeline or any subsequence of
+the pipeline. The user specifies the names of the first and last modules in the
+desired pipeline and MITALK creates the pipeline processes. If the first module is
+FORMAT, then the user can provide input text directly from the terminal. If any
+other module is first, then its input must come from a previously created disk file.
+If the last module is COEWAY, then output can be sent to a digital-to-analog con-
+verter to be played aloud. The output from other modules can be sent to a disk file
+or can be translated from raw binary to human-readable form for display on the
+user’s terminal. The program SHOW is used to perform the translation.
+
+14.5 Using the system
+
+Figure 14-1 demonstrates most of the features of the system. This figure contains
+a complete copy of a MITalk session. User-typed text is in boldface. The first line
+of text is the user’s command to the UNIX shell to start the top-level process with
+the given list of parameters. Next, MITALK decodes the parameters to determine
+the appropriate pipeline structure. In this case, the pipeline begins with FORMAT
+receiving input from the user terminal (tty) and ends with SOUNDI1 sending output
+to the terminal. In addition, the contents of the pipe streams between DECOMP
+and PARSER, and between PARSER and SOUNDI, are displayed on the terminal.
+Output from each module is prefixed by the module name.
+
+175
diff --git a/pages-txt/188.txt b/pages-txt/188.txt
new file mode 100644
index 0000000..64a0d72
--- /dev/null
+++ b/pages-txt/188.txt
@@ -0,0 +1,71 @@
+From text to speech: The MITalk system
+
+176
+
+$mitalk decomp tty parser tty soundl
+
+MITALK: System configuration:
+
+tty
+
+I
+FORMAT
+DECOMP
+
+tty<---|
+PARSER
+
+tty<-—-—|
+SOUND1
+
+I
+tty
+
+MITALK: Starting system...
+MITALK: System running
+
+MITALK: Please enter text: (type "D [control-D] to
+
+exit)
+
+The old man sat in a rocker.
+“D
+
+DECOMP: THE (ARTICLE) => THE
+DECOMP: OLD (ADJECTIVE, NOUN) => OLD
+DECOMP: MAN (NOUN, VERB) => MAN
+
+DECOMP: SAT (VERB, PAST PARTICIPLE)
+
+DECOMP: IN (PREPOSITION, ADVERB)
+DECOMP: A (ARTICLE) => A
+PARSER: NOUN GROUP: THE OLD MAN
+PARSER: VERB GROUP: SAT
+DECOMP: ROCKER (NOUN) => ROCK+ER
+DECOMP: . (END PUNCTUATION MARK)
+DECOMP: . (END PUNCTUATION MARK)
+DECOMP: <EOF>
+
+PARSER: PREPOSITIONAL PHRASE: IN A ROCKER
+
+PARSER: UNCLASSIFIED: .
+PARSER: UNCLASSIFIED: .
+PARSER: <EOQOEF>
+SOUND1: DH ’AH
+SOUND1: "OW LL DD
+SOUND1: MM ’'AE NN
+SOQUND1l: SS 'AE TT
+SOUND1: 'IH NN
+SOUND1l: AX
+SOUND1: RR ’AA KK * - ER
+SOUND1: .
+SOUND1l: .
+SOUND1: <EOF>
+
+MITALK: System done
+%
+
+Figure 14-1: Sample MITalk session
+
+=> SAT
+=> IN
diff --git a/pages-txt/189.txt b/pages-txt/189.txt
new file mode 100644
index 0000000..79a3379
--- /dev/null
+++ b/pages-txt/189.txt
@@ -0,0 +1,48 @@
+Appendix A
+
+Part-of-speech processor
+
+A concise description of the algorithm of the part-of-speech processor follows:
+
+IF there is no decomposition
+THEN assign (NOUN (NUM SING)),
+(VERB (INF TR) (PL TR)), (ADJ)
+ELSEIF last morph is not a suffix
+THEN IF first morph is a verb prefix
+THEN assign (VERB (INF TR) (PL TR))
+ELSEIF first morph is A
+THEN assign (ADJ), (ADV)
+ELSE assign from last morph
+END IF
+ELSEIF last morph is ING
+THEN assign (VERBING)
+ELSEIF last morph is ED
+THEN assign (VERBEN), (VERB (SING TR) (PL TR))
+ELSEIF last morph is S or ES
+THEN IF next morph is not a suffix
+AND first morph is a verb prefix
+THEN assign (VERB (SING TR))
+ELSE IF next morph is a verb
+THEN assign (VERB (SING TR))
+END IF
+IF next morph is a NOUN, ADJ, INTG, ER, or ING
+THEN assign (NOUN (NUM PL))
+END IF
+IF next morph is an ORD
+AND next morph is not SECOND
+THEN assign (ORD (NUM PL))
+END IF
+IF there is no assignment
+THEN assign (NOUN (NUM PL))
+END IF
+END IF
+ELSEIF last morph is ER
+THEN IF next morph is an ADV
+THEN assign (ADY)
+END IF
+IF next morph is an ADJ
+THEN assign (ADJ)
+END IF
+IF next morph is a NOUN or VERB
+
+177
diff --git a/pages-txt/190.txt b/pages-txt/190.txt
new file mode 100644
index 0000000..a3897b5
--- /dev/null
+++ b/pages-txt/190.txt
@@ -0,0 +1,41 @@
+From text to speech: The MITalk system
+
+THEN assign (NOUN (NUM SING))
+END IF
+ELSEIF last morph is S’
+THEN assign (NOUN (POSS TR))
+ELSEIF last morph is ’S
+THEN IF next morph is a NOUN
+THEN assign (NOUN (POSS TR)),
+copy (NOUN (CONTR TR))
+ELSEIF next morph is a PRN
+THEN copy (PRN (CONTR TR))
+IF next morph has feature (PRNADJ TR)
+THEN copy (PRN (CASE POSS))
+ENDIF
+ENDIF
+ELSEIF last morph is N’T
+THEN IF next morph is NEED
+THEN assign (MOD (AUX A) (NOT TR))
+ELSEIF next morph is a BE
+THEN copy (BE (NOT TR))
+ELSEIF next morph is a HAVE
+THEN copy (HAVE (NOT TR))
+ELSEIF next morph is a MOD
+THEN copy (MOD (NOT TR))
+ENDIF
+ELSEIF last morph is *VE
+AND next morph is a MOD
+THEN copy (MOD (CONTR TR))
+ELSEIF last morph is *VE, °D, ’°LL, or ’RE
+THEN IF next morph is S
+THEN assign (NOUN (NUM PL) (CONTR TR))
+ELSEIF next morph is a NOUN
+THEN copy (NOUN (CONTR TR))
+ELSEIF next morph is a PRN
+THEN copy (PRN (CONTR TR))
+ENDIF |
+ELSE assign part of speech from rightmost morph
+ENDIF
+
+178
diff --git a/pages-txt/191.txt b/pages-txt/191.txt
new file mode 100644
index 0000000..3a4e614
--- /dev/null
+++ b/pages-txt/191.txt
@@ -0,0 +1,35 @@
+Appendix B
+
+Klatt symbols
+
+Table B-1: Klatt symbols for phonetic segments
+
+Vowels
+
+(An but @7 bought @ bout
+(aY) bite  (Ep bet (E®) bird
+
+T bit (Ix impunity . I)gaj’beer
+xR boar Gy boy (us, book
+(YW beauty
+Sonorant Consonants
+(EL bottle @H hat /X the hurrah ‘Ii let @} bill
+CRR rent  ~Rx fire @ wet (:\/ which LE}’ yet
+Nasals
+TN ’ o =~ ?\ N .
+(ﬁm keep’em - ’@ button @ met NN net NG sing
+Fricatives
+’,'//Dé that {’f‘?) fin “88") sat @ shin @'H thin
+Nt o ;
+VvV _ vat 22 20O ZH) azure
+Plosives
+~ P . -
+DD debt (DX butter ‘GG gore GP give
+ke keen (/P pet TT) fen at Alan
+Affricates
+[ca chin 35 gin
+
+Pseudo-vowel
+aAxp Plosive release
+
+179
diff --git a/pages-txt/192.txt b/pages-txt/192.txt
new file mode 100644
index 0000000..fb536c9
--- /dev/null
+++ b/pages-txt/192.txt
@@ -0,0 +1,17 @@
+From text to speech: The MITalk system .
+
+Table B-2: Klatt symbols for nonsegmental units
+
+Stress Symbols
+/ or 1 primary lexical stress mor 2 secondary lexical stress
+
+Word and Morpheme Boundaries
+
+- syllable boundary * morpheme boundary
+C: begin content word F: begin function word
+Syntactic Structure
+end of declarative utterance )? end of yes/no question
+' orthographic comma )N end of noun phrase
+)P potential breath pause )C end of clause
+
+180
diff --git a/pages-txt/193.txt b/pages-txt/193.txt
new file mode 100644
index 0000000..fd1dc83
--- /dev/null
+++ b/pages-txt/193.txt
@@ -0,0 +1,44 @@
+Appendix C
+
+Context-dependent rules for PHONET
+
+This appendix presents the context-dependent rule set used in module PHONET.
+
+C.1 Notation
+The phonetic segment to parameter conversion rules are given in a form similar to
+that of the lexical stress rules in Chapter 6. The following modifications are made
+
+to the rule form described in Chapter 6:
+e The symbol “S” is used to represent any phonetic segment. This
+replaces the symbols “V” and “C” used in the previous set of rules.
+e In addition to the features “+stress” and “-stress”, there is a set of fea-
+tures used to classify phonetic segments according to general
+
+properties. These features are listed in the next section.
+The general form of a rule is as follows:
+
+variable « value [ pattern
+
+which means: “variable gets set to value in the context of pattern”. In addition to
+the « operation, there are the T and | operations which mean to increase or
+decrease (respectively) the value of variable by the amount value.
+
+The variable is one of several parameters which hold state information about
+the current phonetic segment. For example, “Target” is a table of target values for
+each parameter at the end of the current segment. The rule:
+
+Target[ave] 4 30/ [ +fricativi +voiced] [ -voswel]
+
+means that the Target value for parameter avc gets decreased by 30dB, if the cur-
+rent segment is a voiced fricative and the next segment is nonvocalic.
+
+The overall structure of the program which implements these rules is as fol-
+lows: The top level of the program is a loop which examines each phonetic seg-
+ment from the input stream in sequence, one at a time. For each segment, the set
+of state variables is initialized, then the rules are applied in sequence in the order
+presented below. After the rules have been applied to the current segment, the in-
+formation in the state variables is used to update the values of the output
+parameters over the current time interval (from the start time of the current seg-
+ment to the end time, as previously determined by PROSOD).
+
+181
diff --git a/pages-txt/194.txt b/pages-txt/194.txt
new file mode 100644
index 0000000..9457a80
--- /dev/null
+++ b/pages-txt/194.txt
@@ -0,0 +1,52 @@
+From text to speech: The MITalk system
+
+The pattern in each rule is applied independently of all other rule patterns, ex-
+cept when it is preceded by the word “ELSE”, in which case that rule is applied
+only if the preceding rule failed to match.
+
+The state variables themselves are described in the following section.
+
+C.2 State variables
+Most of the state variables are one-dimensional arrays containing one value for
+each of the output parameters. The notation variable[parameter] denotes the value
+of variable variable for parameter parameter. If parameter has the form pl, p2,
+then this stands for the value of the variable for both parameters p/ and p2. If
+parameter has the form pl..p2, then this stands for the values for parameters pl
+through p2 inclusive.
+The state variables are:
+
+Cumdur[av..f0]
+
+The “current time” for each parameter for the current segment. This is
+the absolute time in msec at which the segment begins for each
+
+parameter, measured from the beginning of the utterance. This cor-
+responds to t; in Figure 11-6.
+
+Segdur[av..f0]
+
+The duration in msec of the current segment for each parameter. The en-
+ding time for parameter “x” is Cumdur[x]+Segdur[x]. This corresponds
+to t, in Figure 11-6.
+
+Mintime[av..f0]
+This is the minimum absolute time to which “backward” smoothing of
+each parameter can propagate. This corresponds to t; in Figure 11-6.
+
+Trantype[av..f0]
+This is the transition type from Figure 11-6.
+
+Target{av..f0]
+The desired target values for each parameter at time Segdur+Cumdur.
+The dimensions of these values are dB for amplitude parameters, and Hz
+for frequency and bandwidth parameters.
+
+Diptar{f1..f3]
+Diphthong target values in Hz.
+
+Oldval[av..f0]
+
+This is the current value of each parameter at time Cumdur (i.e. the value
+
+at the end of the previous segment).
+182
diff --git a/pages-txt/195.txt b/pages-txt/195.txt
new file mode 100644
index 0000000..79c7c8d
--- /dev/null
+++ b/pages-txt/195.txt
@@ -0,0 +1,51 @@
+Context-dependent rules for PHONET
+
+Nextar[av..f0]
+This is the value of Target for the next segment.
+
+Tcflav..f0]
+This is the duration of forward smoothing measured from Cumdur.
+
+Tcblav..f0]
+This is the duration of backward smoothing measured from Cumdur
+
+(limited by Mintime).
+
+Bper[av..f0]
+This the percent of movement from locus toward target in a CV or VC
+transition.
+
+Bvf[av..fO]
+This is the desired value of each parameter immediately after time Cum-
+dur (generally derived from Bper).
+
+Bvbl[av..f0]
+This is the desired value of each parameter immediately before time
+Cumdur.
+
+C.3 Phonetic segment classes
+
+affricate cH, JJ
+
+alveolar DD, DX, EN, NN, SS, TQ, TT, 22
+aspseg  HH, HX, WH
+
+dental DH, TH
+
+diphthong
+AE, RO, AW, AXR, AY, EH, EXR, EY, IH, IXR, IY, OW, OXR, OY, UH,
+UW, YU
+
+f2back IY, YU, YY
+
+fricative bDH, FF, SS, SH, TH, VV, ZH, 22
+
+front AE, EH, EXR, EY, IH, IX, IXR, IY, YU
+
+glottal HH, HX, QQ, SIL
+
+high IH, IX, IXR, IY, UH, UW, UXR, WH, WW, YU, YY
+
+labial BB, EM, FF, MM, PP, VV, WW, WH
+
+183
diff --git a/pages-txt/196.txt b/pages-txt/196.txt
new file mode 100644
index 0000000..f7e1b28
--- /dev/null
+++ b/pages-txt/196.txt
@@ -0,0 +1,89 @@
+From text to speech: The MITalk system
+
+lateral
+lax
+ligglide
+low
+nasal
+palatal
+palvel
+plosive
+retro
+rglide
+round
+schwa
+
+sonorant
+
+stop
+
+syllabic
+
+velar
+
+voiced
+
+vowel
+
+wglide
+yglide
+
+184
+
+EL, LL, LX
+AE, AO, AX, AXP, EH IH, IX, UH
+
+EL, LL, LX, RR, RX, WH, WW, YY
+
+AR, AE, RO, AW, AXR, AY
+
+EM, EN, MM, NN, NG
+
+CH, JJ, SH, YY, ZH
+
+GP, KP
+
+BB, CH, DD, GG, GP, JJ, KK, KP, PP, TQ, TT
+
+ER, RR, RX
+
+AXR, EXR, IXR, OXR, UXR
+
+RO, OW, OXR, OY, UH, UW, WH, WW, YU
+
+AX, IX
+
+AR, AE, AH, AO, AW, AX, AXR, AY, EH, EL, EM, EN, ER, EXR,
+
+HH, HX, IH, IX, IXR, IY, LL, LX, MM, NG, NN, OW, OXR, OY, RR,
+UH, UW, UXR, WH, WW, YU, YY
+
+BB, CH, DD, DX, EM, EN, GG, GP, JJ, KK, KP, MM, NG, NN, PP,
+TQ, TT
+
+AA, AE, AH, AO, AW, AX, AXR, AY, EH, EL, EM, EN, ER, EXR,
+IH, IX, IXR, IY, OW, OXR, OY, UH, UW, UXR, YU
+
+GG, KK, NG
+
+AA, AE, AH, AO, AW, AX, AXR, AY, BB, DD, DH, DX, EH, EL, EM,
+ER, EXR, EY, GG, GP, HX, IH, IX, IXR, IY, JJ, LL, LX, MM, NG,
+OW, OXR, OY, 0Q, RR, RX, TQ, UH, UW, UXR, VV, WH, WW, YU, YY,
+22
+
+EY,
+
+QQ,
+
+EY,
+
+EN,
+NN,
+ZH,
+
+AA, AE, AH, AO, AW, AX, AXR, AY, EH, ER, EXR, EY, IH, IX, IXR,
+
+1Y, OW, OXR, OY, UH, UW, UXR, YU
+AW, OW, UW, YU
+
+AY, EY, IY, OY
diff --git a/pages-txt/197.txt b/pages-txt/197.txt
new file mode 100644
index 0000000..4d23ea8
--- /dev/null
+++ b/pages-txt/197.txt
@@ -0,0 +1,46 @@
+Context-dependent rules for PHONET
+
+C.4 Initialization
+J—
+
+Set manner class according to segment class:
+
+Manner « vowel / [ +v:)_we1]
+
+ELSE Manner « stop / [ +.s—t_<_>p]
+
+ELSE Manner « fricative / [ +frifiﬁ"e]
+
+ELSE Manner « sonorant
+
+Set amplitude, frequency, and bandwidth targets from Tables C-1 and C-2.
+The segments HH and X inherit frequency targets from the next segment.
+
+Target[fnz] « 250 /—
+
+Target[an] <« 0 /—
+Set Tcf from Table C-3.
+
+Buramp « 57 /—
+Aspamp « 51 /—
+
+Default transition type is SETSMO.
+Set transition boundary value target Bper from Table C-4.
+
+C.5 General rules
+
+J—
+Bper[f0] « 75
+Suppress amplitude smoothing after plosive:
+
+Mintime[af] < Cumdur(af] / [ *PI%Si"e]—
+Discontinuous transition out of unvoiced segment:
+
+Trantype(f0, av] « DISSMO / [ "'°§°°d]—-
+
+Mintime[f0] « Cumdur{f0] / [ “’°§°ed]—-
+Do a breathy offset into a pause:
+
+Aspdux <« 30, Oldval[avc] T 6, Aspam1 « Aspamp / [ -SSIL] [ +EL]
+
+185
diff --git a/pages-txt/198.txt b/pages-txt/198.txt
new file mode 100644
index 0000000..9183048
--- /dev/null
+++ b/pages-txt/198.txt
@@ -0,0 +1,46 @@
+From text to speech: The MITalk system
+
+Table C-1: Parameter targets for nonvocalic segments
+
+avavcasp af a2 a3 a4 a5 a6 ab fl f2 f3 f4 bl b2 b3
+
+axe 57 60 O O 60 60 60 60 60 O 430 1500 2500 3300 120 60 120
+BB 054 0 0 0 0 0 0 072 200 9S00 2100 3300 65 90 125
+CH 0O 0 0 0 060757070 O 300 1700 2400 3300 200 110 270
+DD 054 0 0 0 O 0508 0 200 1400 2700 3300 70 115 180
+DH 36 54 060 0 O O O 30 54 300 1150 2700 3300 60 95 185
+DX 44 60 0 0 60 60 60 60 60 O 200 1600 2700 3300 120 140 250
+EL 5757 0 060 60 60 60 60 O 450 800 2850 3300 65 60 80
+EMm 35157 0 06060606060 O 200 900 2100 3300 120 60 70
+EN 5157 0 06060 60 60 60 0 200 1600 2700 3300 120 70 110
+FF 0 03160 0O 0 0 O O 64 400 1130 2100 3300 225 120 175
+GG 054 0 070303060 10 0 250 1600 1900 3300 70 145 190
+GP 054 0 030706062 62 0 200 1950 2800 3300 120 140 250
+HH O 060 060 60 60 60 60 O 450 1450 2450 3300 300 160 300
+Hx 44 60 57 0 60 60 60 60 60 O 450 1450 2450 3300 200 120 200
+JJ 054 0 0 06075 70 70 0 200 1700 2400 3300 50 110 270
+KK 0 0 0 07330306010 O 35 1600 1900 3300 280 220 250
+KP O 0 0 0307060 62 62 0 300 1950 2800 3300 150 140 250
+i 5057 0 06060 60 60 60 O 330 1050 2800 3300 50 100 280
+x 35757 0 060 60 60 60 60 0O 450 800 2850 3300 65 60 80
+MMy 35157 0 060 60 60 60 60 O 480 1050 2100 3300 40 175 120
+N 35157 0 060 60 60 60 60 0O 480 1600 2050 3300 160 150 100
+W 35157 0 06060 60 60 60 0 480 1400 2700 3300 40 300 260
+PP 0O 0O 0OOO O O O 072 300 900 2100 3300 300 150 185
+QQ 0O 0 0 060 60 60 60 60 0O 400 1400 2450 3300 120 140 250
+RR 50 57 O 0 60 60 60 60 60 O 330 1060 1380 3300 70 100 120
+Rx 57 57 0 060 60 60 60 60 0 460 1260 1560 3300 60 60 70
+SH 0 03160 06075 70 70 0 400 1650 2400 3300 200 110 280
+st 0 O O 060 60 60 60 60 O 400 1400 2400 3300 120 140 250
+ss 0 03160 0O O O0S08 0O 400 1400 2700 3300 200 95 220
+TH O 03160 0 0 O O 30 54 400 1150 2700 3300 225 95 200
+TQ O 0 0 0O O O 05082 0 200 1400 2700 3300 120 140 250
+TT O 00O O O 0508 0O 300 1400 2700 3300 300 180 220
+w 4054 060 O 0 0 O 064 300 1130 2100 3300 55 95 125
+WH 05751 06060 60 60 60 0O 330 600 2100 3300 150 60 60
+woe 35057 0 060 60 60 60 60 O 285 610 2150 3300 S50 80 60
+YY 35057 0 060 60 60 60 60 0 240 2070 3020 3300 40 250 500
+zH 4054 060 060 75 70 70 0 300 1650 2400 3300 220 140 250
+2z 4054 060 O O 0508 O 300 1400 2700 3300 70 85 190
+
+186
diff --git a/pages-txt/199.txt b/pages-txt/199.txt
new file mode 100644
index 0000000..64da133
--- /dev/null
+++ b/pages-txt/199.txt
@@ -0,0 +1,98 @@
+Context-dependent rules for PHONET
+
+Table C-2: Parameter targets for vocalic segments
+
+avavcasp af a2 a3 a4 a5 a6 ab fl 2 f3 f4 bl b2 b3
+
+aA 5757 0 060 60 60 60 60 O 700 1220 2600 3300 130 70 160
+
+AE 57 57 0 060 60 60 60 60 O 620 1660 2430 3300 70 130 300
+650 1450 2470
+
+AR 5959 0 060 60 60 60 60 O 620 1220 2550 3300 80 50 140
+
+a0 5858 0 06060 60 60 60 0 600 990 2570 3300 90 100 80
+630 1040 2600
+
+aw 5757 0 060 60 60 60 60 O 640 1230 2550 3300 80 70 110
+420 940 2350
+
+ax 60 60 O 0 60 60 60 60 60 O 550 1260 2470 3300 80 50 140
+
+AXR 60 60 O 0 60 60 60 60 60 O 680 1170 2380 3300 60 60 110
+520 1400 1650
+
+ay 5858 0 060 60 60 60 60 0 660 1200 2550 3300 100 120 200
+400 1880 2500
+
+EH 61 61 0 0 60 60 60 60 60 O 530 1680 2500 3300 60 S0 200
+620 1530 2530
+
+ER 6262 0 0 60 60 60 60 60 0O 470 1270 1540 3300 100 60 110
+420 1310 1540
+
+EXR 60 60 O 0 60 60 60 60 60 O 460 1650 2400 3300 60 80 140
+
+450 1500 1700
+EY 5950 0 0606060 60 60 O 480 1720 2520 3300 70 100 200
+| 330 2200 2600
+IH 60 60 O 060 60 60 60 60 O 400 1800 2670 3300 50 100 140
+
+470 1600 2600
+Ix 6060 O 06060 60 60 60 O 420 1680 2520 3300 SO 100 140
+IXR 6060 O O 60 60 60 60 60 O 320 1500 2900 3300 70 80 120
+
+420 1550 1750
+310 2200 2960 3300 50 200 400
+
+290 2070 2980
+
+3
+o
+
+1y 6060 0 0 60 60 60 60
+
+ow 6060 0 060 60 60 60 60 0 540 1100 2300 3300 80 70 70
+450 900 2300
+
+oXR 60 60 0O 0 60 60 60 60 60 O 550 820 2200 3300 60 60 60
+490 1300 1500
+
+60 60 60 0 550 960 2400 3300 80 120 160
+360 1820 2450
+
+60 60 60 O 450 1100 2350 3300 80 100 80
+500 1180 2390
+
+oy 6262 0 0 60 60
+60
+
+ow 6464 0O 060 60 60 60 60 O 350 1250 2200 3300 65 110 140
+60
+60
+
+UH 6363 0 0 60
+
+320 900 2200
+
+60 60 60 O 360 800 2000 3300 60 60 80
+390 1150 1500
+
+60 60 60 O 290 1500 2600 3300 70 160 220
+
+330 1200 2100
+
+e
+
+3
+
+UXR 60 0 0 60
+
+R
+g
+o
+o
+
+YU /60
+
+187
diff --git a/pages-txt/200.txt b/pages-txt/200.txt
new file mode 100644
index 0000000..d9ee487
--- /dev/null
+++ b/pages-txt/200.txt
@@ -0,0 +1,45 @@
+From text to speech: The MITalk system
+
+Table C-3: Default values for duration of forward smoothingf(Tc'f)
+
+av 25 ave 20 af 40 asp 20
+an 40 a2p 40 a3p 40 adp 40
+aS5p 40 abp 40 ab 40 f0 120
+f1 80 2 80 f3 80 f4 80
+fnz 150 bl 80 b2 80 b3 80
+Table C-4: Default values for Bper
+
+Previous Current manner class
+
+manner vowel stop fricative sonorant
+
+class
+
+vowel 50 35 50 75
+
+stop 65 50 50 65
+
+fricative 50 50 50 75
+
+sonorant 25 35 25 50
+
+Aspiration between voiced sonorant and following unvoiced consonant in-
+trudes on the voicing:
+
+Aspdux « 10 / [ +Sonsorant] [ ‘V?Eed]
+
+Start frication early if fricative follows sonorant:
+
+Cumdur(af] | 20, Segdur{af] T 20/
+[ +sonsorant] [ -voiced, +fricative]
+
+Aspam1 L 6/ [ +sonsorant] [ -voice-d_,- +stop]
+
+If Aspdux was set above, then shift Cumdur[asp] earlier by the amount
+Aspdux. Force the value of asp at the new Cumdur[asp] to be Aspaml and
+linearly smooth asp over the 30 msec interval preceding Cumdur[asp]. (See Figure
+C-1.)
+
+FO transitions into and out of voiceless segments are discontinuous:
+
+188
diff --git a/pages-txt/201.txt b/pages-txt/201.txt
new file mode 100644
index 0000000..3380625
--- /dev/null
+++ b/pages-txt/201.txt
@@ -0,0 +1,45 @@
+Context-dependent rules for PHONET
+Trantype[av] < SMODIS / [ 'V?_i_fed]
+Trantype[f0] « DISCON / [ -vogced] [ -vc_)iied]
+Trantype[f0] « SMODIS / [ +vosiced] [-vgifed]
+
+Tef[f1..b3] « Tcobst[X]/ [ +S;(°P] [ -S_tgp]
+
+Trantype[av,avc] < DISCON,
+Trantypelaf..b3] « DISSMO / [+st°p,s+nasal] [-stop, +voiccd]
+
+ELSE Trantype[av..ab,avc] « DISSMO,
+Trantype[f1..b3] « SETSMO
+
+T
+g
+@
+
+. —w—  Agpdux —»
+
+Aspam1
+
+asp
+
+ooooooooooooooooooooo
+
+Oldval
+
+Mintime «— Cumdur
+
+Figure C-1: Pre-aspiration parameter smoothing
+
+C.6 Sonorant rules
+
+/[ +sonor
+
+Lower f4 if lips rounded:
+Targer(f4] 4 100/ [ #1000
+
+F4 higher in retro and lateral segments:
+
+Targeare) 1300/ { [ o] / [ Haterd ] !
+
+Transitions shorter out of liquids and glides:
+
+189
diff --git a/pages-txt/202.txt b/pages-txt/202.txt
new file mode 100644
index 0000000..02891fd
--- /dev/null
+++ b/pages-txt/202.txt
@@ -0,0 +1,43 @@
+From text to speech: The MITalk system
+
+TefieL.b3] 110/ [ *§] [ #round |
+
+ELSE Tcf[f1..b2] « 75 [ +1iggli]_
+
++liqgli, +retro | __
+ch[f3]<—90/[ 8% ] ’
+
+W and Y-glides act like sonorants:
+
+Bperlfl. 3] < 35/ { [+w§lide] / [ -ygéide] } _
+
+Increase transition between back vowels and palatalized consonants:
++palatal || +round
+ch[fZ]TSO/[ - ][ v ]
+
+C.6.1 Aspiration of an unvoiced fricative or plosive intrudes on the sonorant
+
+/[ -voiced, -glotgtop, -SIL, -HH]_
+
+Assume a stressed, word-initial voiceless plosive:
+Aspdur « 40
+Apsaml « Aspamp
+
+Reduce aspiration if the preceding obstruent is unstressed:
+Aspdur « 20, Aspam1 4 3/ [ 'Stgess]-—
+
+Lengthen aspiration in plosive-obstruent clusters:
+
+ASpdlll' « 55 / [ -lals)ial] [ -vowe}_:lateral]
+
+ELSE Aspdur « 50/ | “Yo%el]
+
+Little aspiration if preceding segment voiced and current syllable unstressed:
+
+Aspdur 4 10/ [ +voSiced] S[ -stress
+
+Large aspiration into silence:
+
+Aspdur 70/ [ 2
+
+190
diff --git a/pages-txt/203.txt b/pages-txt/203.txt
new file mode 100644
index 0000000..a4e206b
--- /dev/null
+++ b/pages-txt/203.txt
@@ -0,0 +1,44 @@
+Context-dependent rules for PHONET
+
+Aspiration starts during plosive burst:
+Cumdur{asp,av,bl] T Bufdur[X] -5,
+Aspdur T Burdur{X] - 5 / [ +P1ggi"e]__
+
+Aspiration duration is longer if fricative followed by a sonorant consonant:
+
+Aspdur « 10 / [ +fri(§1tive] [ +V0—\Nel]
+
+Aspdur < Segdurfaspl/2/ [ +fn'c§1tive] [ -viavel]
+
+Draw aspiration segment:
+
+Aspdur « min(Aspdur, Segdur[asp], segdur[av])
+
++alveolar +retro
+Aspaml T 9/ [ S ] [
+
+If Aspdur is now nonzero, then draw an aspiration segment with duration
+Aspdur starting at time Cumdur{asp] where the values of asp, av, avc, and bl are
+Aspaml, 0, 0, and 150, respectively. Shift Cumdur forward by Aspdur for these
+four parameters. If the previous segment is a fricative, then smooth asp backward
+over the 30 msec interval before Cumdur[asp].
+
+C.6.2 Nonnasal sonorants
+
+/[ -v?_vi/el]
+
+F2 and f3 coarticulate with next vowel targets in RR and LL:
+
+Target[av] {3/ [ "_'_If]
+Target[f2] « .9*Target[f2] + .1*TARGET[f2, X]/
+[ +LL] [ +sonorant, -nasal]
+—_ X
+Coart(75), Target[f3] « Target[f2]+250 / [ ’f*_R]
+
+Target[f2] « .75*Target[f2] + .25*TARGETI(f2, X] /
+
+[ +RR] [ +sonora§t, -nasal
+
+Coarticulate with schwa if next segment is nonsonorant or nasal:
+
+191
diff --git a/pages-txt/204.txt b/pages-txt/204.txt
new file mode 100644
index 0000000..0fb657d
--- /dev/null
+++ b/pages-txt/204.txt
@@ -0,0 +1,44 @@
+From text to speech: The MITalk system
+
+Target[f2] « .9*Target[f2] + .1*1450/
+[-l-iL] { [-son}%rant] / [+n)a(sal] }
+Target[f2] « .75*Target[f2] + .25*TARGETIf2, X] /
+[-l:Rfl] { [-son)%rant] / [ +n§1(sal] }
+
+Transitions shorter into liquids and glides:
+-nasal, +sonor +lateral
+Bper(f1.£3] « 90 [ “masahgisonor] [ +lateral ]
+
+Transitions between retro and lateral are short:
+
+Tef[£1..b3] « 90/ [ 'h'igli]
+
+ELSE Tcf[f1..b3] < 50/ [ +1iggﬁ]——
+ELSE Tef[f1..b3] <« 70
+
+ch[B] «— 90/ [ +retro
+
+C.7 Vowels
+
+/[ +v2vel]
+
+Heavy forward coarticulation of schwa:
+
+Target[f2] « avg(Target[f2], TARGETI[f2, X]) /
+[ +s0nor, -nasal] S[ +AXP]
+X —_—
+
+No schwa offglide before velar consonant:
+
+Diptar(f2] - Target{f2] / [ +Hront +lax | [ +velar]
+Velarization of front vowels if followed by +lateral:
+
+, | +front +yglide +lateral
+Dlptar[fz]¢300/{[ o ]/[ et ]} [ S ]
+
+Diptar[f2] 4 300/
+{ [ +front, -diphthong] / [ +yglide, -diphthong] } [ +la§eral]
+
+YU fronted if alveolar plosive follows:
+
+192
diff --git a/pages-txt/205.txt b/pages-txt/205.txt
new file mode 100644
index 0000000..0ea04ee
--- /dev/null
+++ b/pages-txt/205.txt
@@ -0,0 +1,41 @@
+Context-dependent rules for PHONET
+
+. +YU +alveolar, +plosive
+Diptar(f2] T 200/ [ _] [ $ ]
+F2 time constant longer in +F2back, +round transition:
+
+Tef[£2] 1 50/ [ +fZIS)ack] [ +rg1ind]
+
+Y-glide not as dramatic before alveolar consonant:
+- +yglide +alveolar
+Diptar{f2] { 150 / [ yeu ] [ S ]
+
+Formant centralized in a short nonretro vowel: Shift Target[fl..£3] and
+Diptar[f1..f3] towards 490,1450,2500 Hz by an amount which varies as a dying ex-
+ponential with time constant 60 msec in the duration of the segment.
+
+Schwa takes on formant targets of adjacent segments:
+
+Target[f3] « average(Oldval[f3], Target[f3], Nextar{f3])/ [ +schwa]
+
+Back cavity resonances assume target quickly:
+Tof[£2] | 20/ [ +f2_‘f°k]
+C.7.1 Diphthong rules
+/[ +diphthong]
+
+Set Tediph and Tdmid from Table C-5.
+Tcenter[f1..f3] « Tdmid * Segdur{fl]
+Tcdips[f1..f3] « Tcdiph * .5 * (1 + Segdur{f1] / 45)
+
+Earlier diphthongization after HH in tense vowels:
+
+Tcenter[f1..£3] « Tcenter[fl..f2] * .67 / [ +§H] [ +E_X ]
+
+Bvalf(f1..f3] « average(Target[f1..£3], Diptar[f1..f3])
+
+Bvalf[f1] T 70/ [ ‘fﬁ‘]
+
+Bvalf[f2] T 200/ { [Jf]/[t(f] }
+Bvalf[f3] 200/ { [=]/[=] }
+
+193
diff --git a/pages-txt/206.txt b/pages-txt/206.txt
new file mode 100644
index 0000000..bb8840b
--- /dev/null
+++ b/pages-txt/206.txt
@@ -0,0 +1,38 @@
+From text to speech: The MITalk system
+Tcenter[f3] « Tcenter[f3] * .3/ [T]
+
+Tcenter[f3] « Tcenter[f3] * .6/ { [+E).(_R] / [ +ER] }
+
+Bvalb[fl..f3] « Bvalf[fl..f3]
+
+Temporarily replace Target[f1.£f3] with Diptar[fl.f3] and move
+Cumdur(f1..f3] to Cumdur[f1..f3] + Tcenter[f]..f3], then draw a transition (see
+Figure C-2).
+
++— Tedips —l— Tedips —
+
+Target Diptar
+
+N
+
+Mintime Cumdur Cumdur+Tcenter Cumdur+Segdur
+
+Figure C-2: Diphthong transition smoothing
+
+Table C-5: Diphthong transition parameters
+
+Tcdiph Tdmid Tcdiph Tdmid Tcdiph Tdmid
+AA 0 80 AE 100 75 AH 0 0
+A0 110 80 AW 120 70 AXR 125 40
+AY 100 55 EH 60 70 ER 180 50
+EXR 100 50 EY 140 55 IH 90 65
+
+IXR 100 50 IY 200 45 OW 150 50
+OXR 110 60 oY 150 60 UH 90 65
+UwW 140 55 UXR 150 50 YU 100 45
+
+C.8 Obstruents
+
+/[ -sonor, +nasal
+
+194
diff --git a/pages-txt/207.txt b/pages-txt/207.txt
new file mode 100644
index 0000000..a03709f
--- /dev/null
+++ b/pages-txt/207.txt
@@ -0,0 +1,44 @@
+Context-dependent rules for PHONET
+
+C.8.1 Fricatives and plosives
+
+/{ [+plfiive] / [ +fri3tive] }
+
+Devoice if not followed by vowel:
+
+Targeave 30/ [ +1aked] [ +opel]
+
+Target[f4] | 150 / [ +pfl_z_1tal]
+
+Target[a2p..a6p] « 0,67,60,65,65 /
+
+[ +alveolar, +plosive] [ -alveolasr, +retro]
+
+Target[a2p..a6p] « 0,52,60,70,73 /
+
+[ +alveolar, +plosive] [ -alveolasr, -reu'o]
+
+BurAmp | 5/ [ +‘Eaf] [ -voswel]
+
+No voicebar in voiced plosive if preceded by obstruent:
+
+Target[avc] « 0/
+
+-voiced / -sonor +voiced, +plosive
+S S —
+
+C.8.2 Fricatives only
+/[ +fricative]
+Target[af] | 4 /—s1L
+
+Short fricatives don’t reach target, so increase target:
+Target[af] TIF Segdur(af] < 80 THEN
+(80 _ Segdur[aﬂ)*deltatﬂ / [ -friC,S°St0p]_[ +VOSW61]
+
+Stronger voicing between vowels:
+
+Target[avc] « 60/ [ +V%we1] [ +v3i_<-:ed] [ +v%wel]
+
+Palatal fricative rounding before rounded vowel.
+
+195
diff --git a/pages-txt/208.txt b/pages-txt/208.txt
new file mode 100644
index 0000000..8bbf16f
--- /dev/null
+++ b/pages-txt/208.txt
@@ -0,0 +1,38 @@
+From text to speech: The MITalk system
+
+Target[f2] 4 50, Target[f3] 4 200 /
+[ +pa1atal] [ +vowelé +round]
+
+C.8.3 Stops
+
+/[ +st0p]
+
+Transition into stop is partially discontinuous (except f0):
+Trantype[av..f1] < SMODIS, Trantype[f2..b3] < SETSMO,
+Trantype[avc] « SMODIS / ['S‘S"P]—
+
+Tef[£2..b3] « 10, Tcf[fl] « 15/ [*hggh]—
+Tcf[f2..b3] < Tcobst[—], Tcf[f1] « average(Tcobst[—], 20) /
+Ede
+
+Table C-6: Duration of forward smoothing for obstruents (Tcobst)
+
+BB 60 CH 100 DD 80 DH 80
+EL 80 EM 60 EN 80 FF 60
+GG 100 GP 100 HH 80 HX 80
+KK 100 KP 100 MM 60 NG 100
+NN 80 PP 80 00 80 SH 100
+siL. 80 SS 80 TH 80 TQ 80
+TT 80 \'a¥ 60 ZH 100 22 80
+
+C.8.3.1 Insert burst at expense of closure duration
+
+/[ +plosive, -gltstop] [ -stopé-s IL]
+
+Set Burdur from Table C-7.
+Burdur T 10/ [ +TT] [ +r§tr0]
+
+Burdur < Segdur[af] IF Burdur > Segdur[af]
+Burdur < 5 IF Burdur =0
+
+196
diff --git a/pages-txt/209.txt b/pages-txt/209.txt
new file mode 100644
index 0000000..9bcba65
--- /dev/null
+++ b/pages-txt/209.txt
@@ -0,0 +1,41 @@
+Context-dependent rules for PHONET
+
+Decrease the value of af by 20 dB at Cumdur[af] and smooth this value back-
+ward 30 msec.
+
+Draw an af segment with value zero and duration Segdur{af]}-Burdur begin-
+ning at Cumdur{af]. Shift Cumdur{af] and Segdur{af] accordingly. (Segdur(af]
+will now equal Burdur.)
+
+Buramp T 6/ [ -vc&ed]
+
+Buramp J, 6/ [ 'H'Igsal] [ -stress
+
+Buramp | 3 /—{ [ +ASXP] / [ +fricsative] }
+
+Target[af] <~ Buramp
+Trantype[af] « DISCON
+
+Target[fnz] « 450, Bper[fnz] « 100,
+Trantype[af..b3] < SMODIS, Trantype[av] < DISCON /
+
+=]
+
+Trantype[av, avc] < DISSMO / [ -voéced] [ +IE_S_a1]
+
+Buramp { 3/ [ -n%sal] [ -stress
+
+Table C-7: Default plosive burst duration
+
+BB 5 ce 15 pp 10 px 10
+e 20 g 20 ' JgJ 10 KK 25
+KP 25 ma 0 N O NN 0
+PP 5 0 15 ™ 15
+
+C.8.4 Nasals
+
+/[ +nasal]
+Target[fnz] « 450
+Trantype[af..b3] « SMODIS
+
+197
diff --git a/pages-txt/210.txt b/pages-txt/210.txt
new file mode 100644
index 0000000..0859bc1
--- /dev/null
+++ b/pages-txt/210.txt
@@ -0,0 +1,35 @@
+From text to speech: The MITalk system
+
+| Bper[fnz] « 100
+i Trantype[av] < DISCON
+
+Trantype[avc, av] « DISSMO / [ 'V°§Ced]_
+
+C.9 Adjustments
+IL_
+
+(swap current and previous segments if Manner[current] > Manner[previous))
+
+Bper[f2] « 75, Bper[f3] « 75/ [ +retro
+sH and 2zH highly constrain boundary unless stop is adjacent:
+
+! Bper[f2] « 20, Tcf[£2] T 30/ [ +Palata§ -sonor]_
+
+C.9.1 Boundary values for stops
+
+[0
+| ' Bperlf2] « 50
+Bper(f2] « 0
+Bper[f3] « 0
+Bper{fnz] < IF Mancur - Manlas > 1 THEN 100 ELSE 0,
+Tef[fnz] « 150, Bper[f1] < 0/ [ +n§sal]_
+
+| Bper{f2] « 65, Bper(f3] « 20/ [ +la§’ia1] [ -glottal_,_+front]
+Bper[f2] « 20, Bper[f3] « 70/ [ +Ia§’ial] [-glottélll -front]
+Oldval[f3] « 1750, Bper[f3] « 20/ [ +1a§)ial] [ -glotta_l,_ +retro]
+
+Oldval[f2] « 1050/ [ +a1v§olar] [ +1ateral]
+
+Oldval[f2] « 1400/ [ +§m] [ -lait_e.ral]
+
+198
diff --git a/pages-txt/211.txt b/pages-txt/211.txt
new file mode 100644
index 0000000..9d1f2d6
--- /dev/null
+++ b/pages-txt/211.txt
@@ -0,0 +1,39 @@
+Context-dependent rules for PHONET
+
+Oldval[f2] « 1600/ [ *al"e°§a’ ’ NN] [ -lateral]
+
+Oldval[f3] « 2300/ [ +31V§°1‘“] [ +retro
+
+Oldval(f3] « 2620 [ *alvgotar | [ -reiro |
+
+Add f1 and f2 vowel targets to compute f2 locus for velar (f1 reflects lip-
+rounding and f2 reflects fronting components of systematic shift in locus):
+
+Oldval[f2] « Target[f2] + (Target[f1] - 300)*2 / [ +V§Iar]_
+Oldval[f3] « Oldval[f2] + 800 / [ +V§1af] [ +1ﬂ9_ial]
+Oldvallf3] « Olavalff2] + 400/ | *'gr][ labial |
+
+Oldval[f2,f3] ¢ xe-target(f2,f4] / { [+ff.m] / [T-‘-A] }
+
+f2 lower in NG:
+Oldval[f2,£3] « (Oldval[f2,£3] + Target[f2,£3])/2 / [ +§G]_
+
+Fronted articulation of KK, GG, and NG adjacentto IY and IH:
+Oldval[f2] T 250, Oldval[f3] T 50/ { [ +_I_Y] / [ wfl] }
+
+(end of manner class reversal)
+C.9.2 All segments
+
+J—
+Oldval[f2, f3, f4] « 1200, 2050, 2500, Bper[f4] « O, Tcf[f4] < 30/
+
+[ +a1veol§r, .-!-stop] [ +WW]
+
+Target[f2, 3] « 1850, 2200, Target[f4] T £4 /
++rglide ] /[ +retro +alveolar]
+S S —
+Target[f2, £3] « 1700, 1900 /
+
+([T} [
+
+199
diff --git a/pages-txt/212.txt b/pages-txt/212.txt
new file mode 100644
index 0000000..1195b12
--- /dev/null
+++ b/pages-txt/212.txt
@@ -0,0 +1,42 @@
+From text to speech: The MITalk system
+
+Oldval[f2, f3] « 1100, 1500 / [ +v§lar] [ +1etro
+Oldval[f3] « 2100/ [ +alv§olar] [ +re_t_ro]
+
+Oldval[f3] « 1700/ [ +1a§ia1] [ +retro
+
+Target]f2, £3] « 900, 2000/ | *Wgide ] [ +velar]
+
+ELSE Target[f2] « 1900
+
+Oldval[f2] « 1850/ [ +alveolaslr, +stop] [ +f2back]
+
+C.10 Modifications
+
+J—
+[ Target[f]..f3] «
+‘ average(Target[f1..f3], Oldval[f1..f3], Nextar[f1..f3]) / [ .?_X]
+
+f2 transitions involving f2back segments are shorter:
+Teflf2] L 20/ [ H2gack |
+
+Minimum formant separation is 200 Hz: raise Target[f2..f4] such that separa-
+tion is at least 200Hz.
+Set boundary values and Tcb:
+
+Tcb[av..f0] « Tcf[av..f0]
+
+| Bvalffav..f0] « (Bper[av..f0] * Target[av..f0]
++ (100 - Bper{av..f0]) * Oldval[av..f0]) / 100
+
+Discontinuous formant jump in a lateral release:
+Balflf1,2] 150, Bvalbie1, 2] 4 50/ | +ageral | —
+
+DISSMO in diphthongs are changed to SMOOTH, so change boundary in-
+stead:
+
+Bvalb[fl..£3] « Oldval[fl..f3] / [ +ngsa1] [ +diP_*1t_h°ng]
+
+Special treatment for amplitude parameters:
+
+200
diff --git a/pages-txt/213.txt b/pages-txt/213.txt
new file mode 100644
index 0000000..063f2fb
--- /dev/null
+++ b/pages-txt/213.txt
@@ -0,0 +1,26 @@
+Context-dependent rules for PHONET
+
+Bvalffav..ab,avc] « average(Target[av..ab,avc], Oldval[av..ab,avc])
+
+Vowel amplitude offset more gradual:
+Teblav] « 30/ [ Fowel |
+Voicing offset in stop or fricative more gradual, in onset too:
+Bvalbav] | 6, Bvalffav] « Balblav] /| #vojoed | [ -voiced, sonr |
+
+Bvalb[av] | 4, Bvalffav] « Bvalb[av] /
+
+{ [ +voiceg, +st0p] / [ +voice§i, +fﬁC] } [ +sonor, +nasal]
+
+Glottal segments (including s11) have no inherent “articulatory” targets:
+
+Bvalb[a2p..b3] « Target[a2p..b3],
+Bvalffa2p..b3] « Target[a2p..b3],
+
+Oldval[a2p. b3] « Target[a2p..b3] / [ +81§“31]_
+Stops have abrupt offset:
+
+Bvalffaf] < Target[af] / [ +P1‘§Si"e]—
+
+Draw all parameter tracks.
+
+201
diff --git a/pages-txt/214.txt b/pages-txt/214.txt
new file mode 100644
index 0000000..2e648c7
--- /dev/null
+++ b/pages-txt/214.txt
@@ -0,0 +1,167 @@
+Appendix D
+
+Sample test trials from the Modified Rhyme Test
+
+A T ATl o A
+
+N N N N N e et et ek bt ek ek ped ek et
+COREBRNREESo VG RrEpEo
+
+202
+
+a) bad
+a) beam
+a) bus
+a) case
+a) cuff
+a) dip
+a) dub
+a) fizz
+a) hear
+a) kid
+a) lace
+a) man
+a) pace
+a) path
+a) peas
+a) pip
+a) puff
+a) rate
+a) safe
+a) sat
+a) seed
+a) sill
+a) sup
+a) tap
+a) tease
+
+b) back
+b) bead
+b) but
+b) cave
+b) cut
+b) din
+b) dun
+b) fin
+b) heath
+b) kit
+b) lame
+b) math
+b) pane
+b) pat
+b) peak
+b) pick
+b) pus
+b) race
+b) sake
+b) sag
+b) seek
+b) sick
+b) sud
+b) tang
+b) tear
+
+¢) ban
+c¢) beach
+c) bug
+c) cape
+C) cuss
+c) dill
+c) dung
+c) fill
+¢) heal
+¢) kill
+¢) lane
+¢) mad
+c) pave
+c) pack
+c) peal
+c) pin
+c) pub
+C) ray
+C) same
+c) sack
+c) seen
+C) sing
+¢) sun
+c) tam
+c) teak
+
+d) bass
+d) beat
+d) buff
+d) cane
+d) cub
+d) dig
+d) dug
+d) fig
+d) heave
+d) kin
+d) lay
+d) mat
+d) page
+d) pad
+d) peace
+d) pill
+d) pun
+d) raze
+d) sane
+d) sap
+d) seep
+d) sit
+d) sum
+d) tan
+d) teal
+
+e) bat
+e) beak
+e) bun
+e) cake
+e) cup
+e) dim
+e) duck
+e) fib
+e) heat
+e) king
+e) lake
+€) mass
+€) pay
+e) pass
+e) peach
+e) pit
+e) puck
+e) rave
+e) save
+€) sass
+e) seem
+e) sin
+e) sub
+e) tab
+e) team
+
+f) bath
+f) bean
+f) buck
+f) came
+f) cud
+f) did
+f) dud
+f) fit
+
+f) heap
+f) kick
+f) late
+f) map
+f) pale
+f) pan
+f) peat
+f) pig
+f) pup
+f) rake
+f) sale
+f) sad
+f) seethe
+f) sip
+f) sung
+f) tack
+f) teach
diff --git a/pages-txt/215.txt b/pages-txt/215.txt
new file mode 100644
index 0000000..1b7e94f
--- /dev/null
+++ b/pages-txt/215.txt
@@ -0,0 +1,36 @@
+Appendix E
+
+Sample test materials from the Harvard Psychoacoustic Sentences
+
+1. The birch canoe slid on the smooth planks
+2. Glue the sheet to the dark blue background
+3. It’s easy to tell the depth of a well
+4. These days a chicken leg is a rare dish
+5. Rice is often served in round bowls
+6. The juice of lemons makes fine punch
+7. The box was thrown beside the parked truck
+8. The hogs were fed chopped corn and garbage
+9. Four hours of steady work faced us
+10. A large size in stockings is hard to sell
+11. The boy was there when the sun rose
+12. A rod is used to catch pink salmon
+13. The source of the huge river is the clear spring
+14. Kick the ball straight and follow through
+15. Help the woman get back to her feet
+16. A pot of tea helps to pass the evening
+17. Smoky fires lack flame and heat
+18. The soft cushion broke the man’s fall
+
+19. The salt breeze came across from the sea
+
+20. The girl at the booth sold fifty bonds
+
+21. The small pup gnawed a hole in the sock
+
+22. The fish twisted and turned on the bent hook
+23. Press the pants and sew a button on the vest
+24. The swan dive was far short of perfect
+
+25. The beauty of the view stunned the young boy
+
+203
diff --git a/pages-txt/216.txt b/pages-txt/216.txt
new file mode 100644
index 0000000..405c3ba
--- /dev/null
+++ b/pages-txt/216.txt
@@ -0,0 +1,36 @@
+Appendix F
+
+Sample test materials from the Haskins Anomalous Sentences
+
+1. The wrong shot led the farm
+2. The black top ran the spring
+3. The great car met the milk
+4. The old corn cost the blood
+
+5. The short arm sent the cow
+
+6. The low walk read the hat
+
+7. The rich paint said the land
+
+8. The big bank felt the bag
+
+9. The sick seat grew the chain
+10. The salt dog caused the show
+11. The last fire tried the nose
+12. The young voice saw the rose
+13. The gold rain led the wing
+14. The chance sun laid the year
+15. The white bow had the bed
+16. The near stone thought the ear
+17. The end home held the press
+18. The deep head cut the cent
+19. The next wind sold the room
+20. The full leg shut the shore
+21. The safe meat caught the shade
+22. The fine lip tired the earth
+23. The plain can lost the men
+24. The dead hand armed the bird
+25. The fast point laid the word
+
+204
diff --git a/pages-txt/217.txt b/pages-txt/217.txt
new file mode 100644
index 0000000..70b7343
--- /dev/null
+++ b/pages-txt/217.txt
@@ -0,0 +1,37 @@
+Appendix G
+
+Sample passage used to test listening comprehension
+
+The lens buyer must approach the problem of purchasing a lens of large aperture
+with caution. The first question to consider is whether the work one intends doing
+will actually require the extreme speed afforded by such a lens. Despite the glow-
+ing advertising claims, no extremely rapid lens is capable of giving, even when
+stopped down to its best aperture, the sharpness of definition which may be ob-
+tained with a well-corrected (and much lower-priced) lens of smaller maximum
+aperture. It is very doubtful if there exists a lens with maximum aperture in excess
+of F 4.5 which will give really sharp definition, whether wide open or at any
+smaller opening; the deficiencies of the large-apertured lens, if it is a fairly good
+one, will not be noticed in small contact prints; but in pictures enlarged to any con-
+siderable extent, they will be evident (or examination of the print with the low
+magnification of a reading glass will make them evident). With modermn, ex-
+tremely rapid films and with synchronized flash available for the amateur who can
+afford to go in for the type of photography that requires this kind of equipment, the
+occasions are indeed rare when a lens faster than F 4.5 is really needed.
+G.1 Test questions for the comprehension passage
+1. The main thought of this passage is that
+1. good photographic work requires the use of a fast lens
+2. lenses of small aperture provide less sharpness than lenses of
+large aperture
+3.lenses of small aperture are to be preferred for most
+photographic work
+4. modern photographic equipment requires the use of lenses of
+large aperture
+2. We may infer that some advertisements for photographic lenses tend
+to recommend the purchase of
+1. large-aperture lenses
+2. small-aperture lenses
+
+3. the most appropriate lenses
+4.F 4.5 lenses
+
+205
diff --git a/pages-txt/218.txt b/pages-txt/218.txt
new file mode 100644
index 0000000..3df919c
--- /dev/null
+++ b/pages-txt/218.txt
@@ -0,0 +1,27 @@
+From text to speech: The MITalk system
+
+3. As the aperture of a lens is increased, the
+1. price tends to decrease
+2. speed of the lens tends to decrease
+3. sharpness of its focus tends to decrease
+4. speed of the lens tends to remain constant
+4. The writer’s attitude toward the advertising materials which are men-
+tioned is one of
+1. indifference
+2. disbelief
+3. acceptance
+4. enthusiasm
+5. The writer’s main purpose is to
+1. encourage the use of synchronized flash
+2. discourage the use of synchronized flash
+3. encourage the use of rapid films
+4. discourage the purchase of fast lenses
+6. To obtain pictures of maximum sharpness, the writer strongly recom-
+mends the use of
+1. lenses of large aperture
+
+2. lenses of small aperture
+3. lower priced films
+4. contact prints
+
+206
diff --git a/pages-txt/219.txt b/pages-txt/219.txt
new file mode 100644
index 0000000..147197b
--- /dev/null
+++ b/pages-txt/219.txt
@@ -0,0 +1,66 @@
+References
+
+Anonymous 1960. Cooperative English Tests: Reading Comprehension, Form
+1B. Princeton, New Jersey: Educational Testing Service.
+
+Anonymous 1972. Iowa Silent Reading Tests, Level 3, Form E. New York: Har-
+court Brace Jovanovich.
+
+Anonymous 1972. Stanford Test of Academic Skills: Reading, College Level II-
+A. New York: Harcourt Brace Jovanovich.
+
+Anonymous 1973. The Nelson-Denny Reading Test, Form D. Boston: Houghton-
+Mifflin.
+
+Akers, G. and M. Lennig 1985. Intonation in text-to-speech synthesis: Evaluation
+of algorithms. Journal of the Acoustical Society of America 77: 2157-65.
+
+Allen, J. 1968. A Study of the Specification of Prosodic Features of Speech From
+a Grammatical Analysis of Printed Text. PhD thesis, Massachusetts Institute
+of Technology, Cambridge, Massachusetts.
+
+Allen, J. 1973. Reading Machines for the Blind: The Technical Problems and the
+Methods Adopted for Their Solution. [EEE Transactions on Audio and
+Electroacoustics AU-21: 259-64. |
+
+Allen, J. 1976. Synthesis of Speech from Unrestricted Text. Proceedings of the
+
+IEEE 64: 422-33,
+Allen, J. 1977. A Modular Audio Response System for Computer Output.
+
+Proceedings of the International Conference on Acoustics, Speech, and Sig-
+nal Processing, 579-81. New York: IEEE. IEEE Catalog No., 77CH1197-3
+ASSP.
+
+Allen, J., S. Hunnicutt, R. Carlson, and B. Granstrom 1979. MITalk-79: The MIT
+Text-to-Speech System. Speech Communication Papers Presented at the
+97th Meeting of the Acoustical Society of America. New York: Acoustical
+
+Society of America.
+Bolinger, D. 1972. Accent is Predictable if You Are a Mind-Reader. Language
+
+48: 633-44.
+
+Buron, R.H. 1968. Generation of a 1000-Word Vocabulary for a Pulse-Excited
+Vocoder Operating as an Audio Response Unit. IEEE Transactions on
+Audio and Electroacoustics AU-16: 21-5.
+
+Caldwell, J. 1979. Flexible, High-Performance Speech Synthesizer Using Custom
+NMOS Circuitry. Journal of the Acoustical Society of America 64, Supple-
+ment 1: S72. (Abstract).
+
+Carlson, R., B. Granstrom, and K. Larsson 1976. Evaluation of a Text-to-Speech
+System as a Reading Machine for the Blind. In Speech Transmission
+Laboratory Quarterly Progress and Status Report 2-3/1976, 9-13. Royal In-
+stitute of Technology, Stockholm.
+
+Carlson, R., B. Granstrom, and D.H. Klatt 1979. Some Notes on the Perception of
+Temporal Patterns in Speech. In B. Lindblom and S. Ohman (eds.),
+Frontiers of Speech Communication Research, New York: Academic Press.
+
+Carlson, R. and B. Granstrom 1973. Word Accent, Emphatic Stress, and Syntax in
+a Synthesis-by-Rule Scheme for Swedish. In Speech Transmission
+Laboratory Quarterly Progress and Status Report 2-3/1973, 31-6. Royal In-
+stitute of Technology, Stockholm.
+
+207
diff --git a/pages-txt/220.txt b/pages-txt/220.txt
new file mode 100644
index 0000000..803d49d
--- /dev/null
+++ b/pages-txt/220.txt
@@ -0,0 +1,75 @@
+References
+
+Carlson, R. and B. Granstrom 1976. A Text-to-Speech System Based Entirely on
+Rules. Proceedings of the International Conference on Acoustics, Speech,
+and Signal Processing, 686-8. New York: IEEE. IEEE Catalog No. 76-
+CH-1067-8 ASSP.
+
+Chapman, W.D. 1971. Techniques for Computer Voice Response. IEEE Inter-
+national Conference Record, 98-9. New York: IEEE.
+
+Coker, C.H. 1967. Synthesis by Rule from Articulatory Parameters. Paper
+presented at the 1967 Conference on Speech Communication and Process-
+ing. L.G. Hanscom Field, Bedford, Massachusetts: Air Force Cambridge
+Research Laboratories, Office of Aerospace Research, United States Air
+Force.
+
+Coker, C.H. 1976. A Model of Articulatory Dynamics and Control. Proceedings
+of the IEEE 64: 452-9.
+
+Coker, C.H., N. Umeda, and C.P. Browman 1973. Automatic Synthesis from Or-
+dinary English Text. IEEE Transactions on Audio and Electroacoustics
+AU-21: 293-397.
+
+Cooper, F.S. 1963. Speech from Stored Data. IEEE Convention Record, part 7,
+137-49. New York: IEEE. |
+
+Cooper, F.S., AM. Liberman, and JM. Borst 1951. The Interconversion of
+Audible and Visible Patterns as a Basis for Research in the Perception of
+Speech. Proceedings of the National Academy of Science 37: 318-25.
+
+Cooper, W.A,, J.M. Paccia, and S.G. Lapointe 1978. Hierarchical Coding in
+Speech Timing. Cognitive Psychology 10: 154-77.
+
+Denes, P.B. 1979. Automatic Voice Answerback Using Text-to-Speech Conver-
+sion by Rule. Journal of the Acoustical Society of America 64, Supplement
+1: S162. (Abstract).
+
+Dixon, R.N. and H.D. Maxey 1968. Terminal Analog Synthesis of Continuous
+Speech Using the Diphone Method of Segment Assembly. IEEE Trans-
+actions on Audio and Electroacoustics AU-16: 40-50.
+
+Dudley, H.W. 1939. The Vocoder. Bell Laboratories Record 17: 122-6.
+
+Dudley, H., R.R. Riesz, and S.A. Watkins 1939. A Synthetic Speaker. Journal of
+the Franklin Institute 227: 739-64.
+
+Dunn, HK. and S.D. White 1940. Statistical Measurements on Conversational
+Speech. Journal of the Acoustical Society of America 11: 278-88.
+
+Egan, J.P. 1948. Articulation Testing Methods. Laryngoscope 58: 955-91.
+
+Epstein, R. 1965. A Transistorized Formant-Type Synthesizer. In Status Report
+on Speech Research SR-1, part 7. Haskins Laboratories, New Haven, Con-
+necticut.
+
+Estes, S.E., H.R. Kirby, H.D. Maxey, and R.M. Walker 1964. Speech Synthesis
+from Stored Data. I.B.M. Journal of Research and Development 8: 2-12.
+
+Fairbanks, G. 1958. Test of Phonemic Differentiation: The Rhyme Test. Journal
+of the Acoustical Society of America 30: 596-600.
+
+Fant, C.G.M. 1956. On the Predictability of Formant Levels and Spectrum En-
+velopes from Formant Frequencies. In For Roman Jakobson, The Hague:
+Mouton.
+
+Fant, C.G.M. 1959. Acoustic Analysis and Synthesis of Speech with Applications
+to Swedish. Ericsson Technics 1.
+
+Fant, C.G.M. 1960. Acoustic Theory of Speech Production. The Hague:Mouton.
+
+Fant, G. and J. Martony 1962. Speech Synthesis. In Speech Transmission
+Laboratory Quarterly Progress and Status Report 18-24/1962. Royal In-
+stitute of Technology, Stockholm.
+
+208
diff --git a/pages-txt/221.txt b/pages-txt/221.txt
new file mode 100644
index 0000000..86d4fde
--- /dev/null
+++ b/pages-txt/221.txt
@@ -0,0 +1,74 @@
+References
+
+Flanagan, J.L.. 1957. Note on the Design of Terminal Analog Speech Synthesizers.
+Journal of the Acoustical Society of America 29: 306-10.
+
+Flanagan, J.L. 1958. Some Properties of the Glottal Sound Source. Journal of
+Speech and Hearing Research 1: 99-116.
+
+Flanagan, J.L. 1972. Voices of Men and Machines. Journal of the Acoustical
+Society of America 51: 1375-87.
+
+Flanagan, J.L., C.H. Coker, and C.M. Bird 1962. Computer Simulation of a For-
+mant Vocoder Synthesizer. Journal of the Acoustical Society of America 35:
+2003. (Abstract).
+
+Flanagan, J.L., C.H. Coker, L.R. Rabiner, R.W. Schafer, and N. Umeda 1970.
+Synthetic Voices for Computers. IEEE Spectrum 7: 22-45.
+
+Flanagan, J.L., K. Ishizaka, and K.L. Shipley 1975. Synthesis of Speech from a
+Dynamic Model of the Vocal Cords and Vocal Tract. Bell System Technical
+Journal 54: 485-506.
+
+Flanagan, J.L. and K. Ishizaka 1976. Automatic Generation of Voiceless Excita-
+tion in a Vocal Cord-Vocal Tract Speech Synthesizer. IEEE Transactions
+on Acoustics, Speech, and Signal Processing 24: 163-70.
+
+Flanagan, JL. and L.R. Rabiner 1973. Speech Synthesis. Stroudsberg,
+Pennsylvania:Dowden, Hutchinson and Ross.
+
+French, N.R. and J.C. Steinberg 1947. Factors Governing the Intelligibility of
+Speech Sounds. Journal of the Acoustical Society of America 19: 50-119.
+
+Fujimura, O. 1961. Analysis of Nasalized Vowels. In Research Laboratory of
+Electronics Quarterly Progress Report 62, 191-2. Massachusetts Institute of
+Technology, Cambridge, Massachusetts.
+
+Fujimura, O. 1962. Analysis of Nasal Consonants. Journal of the Acoustical
+Society of America 34: 1865-75.
+
+Fujimura, O. and J. Lindqvist 1971. Sweep-Tone Measurements of Vocal Tract
+Characteristics. Journal of the Acoustical Society of America 49: 541-58.
+
+Fujimura, O. and J. Lovins 1978. Syllables as Concatenative Phonetic Elements.
+In A. Bell and J.B. Cooper (eds.), Syllables and Segments, New York:
+North-Holland.
+
+Gagnon, R.T. 1978. Votrax Real Time Hardware for Phoneme Synthesis of
+Speech. Proceedings of the International Conference on Acoustics, Speech,
+and Signal Processing, 175-8. New York: IEEE. )
+
+Gaitenby, J. 1965. The Elastic Word. In Status Report on Speech Research SR-2,
+1-12. Haskins Laboratories, New Haven, Connecticut.
+
+Gauffin, J. and J. Sundberg 1974. An Attempt to Predict the Masking Effect of
+Vowel Spectra. In Speech Transmission Laboratory Quarterly Progress and
+Status Report 4/1974, 57-62. Royal Institute of Technology, Stockholm.
+
+Gold, B. and L.R. Rabiner 1968. Analysis of Digital and Analog Formant Syn-
+thesizers. IEEE Transactions on Audio and Electroacoustics AU-16: 81-94.
+
+Goldman-Eisler, F. 1968. Psycholinguistics: Experiments in Spontaneous Speech.
+New York:Academic Press.
+
+Haggard, M.P. and 1.G. Mattingly 1968. A Simple Program for Synthesizing
+British English. IEEE Transactions on Audio and Electroacoustics AU-16:
+95-9.
+
+Halle, M. and S. J. Keyser 1971. English stress: Its form, its growth, and its role
+in verse. New York:Harper and Row.
+
+Holmes, J.N. 1961. Research on Speech Synthesis. Report JU 11.4. Joint Speech
+Research Unit, British Post Office, Eastcote, England.
+
+209
diff --git a/pages-txt/222.txt b/pages-txt/222.txt
new file mode 100644
index 0000000..ad612bb
--- /dev/null
+++ b/pages-txt/222.txt
@@ -0,0 +1,74 @@
+References
+
+Holmes, J.N. 1973. The Influence of the Glottal Waveform on the Naturalness of
+
+Speech from a Parallel Formant Synthesizer. IEEE Transactions on Audio
+and Electroacoustics AU-21: 298-305.
+
+Holmes, J., I. Mattingly, and J. Shearme 1964. Speech Synthesis by Rule.
+Language and Speech 7: 127-43.
+
+Homsby, T.G. 1972. Voice Response Systems. Modern Data November: 46-50.
+
+House, A.S., C.E. Williams, M.H.L. Hecker, and K.D. Kryter 1965. Articulation--
+Testing Methods: Consonantal Differentiation with a Closed-Response Set.
+Journal of the Acoustical Society of America 37: 158-66.
+
+House, A.S. and G. Fairbanks 1953. The Influence of Consonantal Environment
+Upon the Secondary Acoustical Characteristics of Vowels. Journal of the
+Acoustical Society of America 25: 105-13.
+
+Hunnicutt, S. 1976a. A New Morph Lexicon for English. Proceedings of the Sixth
+International Conference on Computational Linguistics. Ottawa, Canada:
+Association for Computational Linguistics.
+
+Hunnicutt, S. 1976b. Phonological Rules for a Text-to-Speech System. American
+Journal of Computational Linguistics Microfiche 57: 1-72.
+
+Ingeman, F. 1978. Speech Synthesis by Rule Using the FOVE Program. In Status
+Report on Speech Research SR-54, 165-73. Haskins Laboratories, New
+Haven, Connecticut. ~
+
+Jayant, N.S. 1974. Digital Coding of Speech Waveforms: PCM, DPCM, and DM
+Quantizers. Proceedings of the IEEE 62: 611-32.
+
+Kaiser, J.F. 1966. Digital Filters. In F.F. Kuo and J.F. Kaiser (eds.), System
+Analysis by Digital Computer, New York: Wiley.
+
+Klatt, D.H. 1970. Synthesis of Stop Consonants in Initial Position. Journal of the
+Acoustical Society of America 47: 93. (Abstract).
+
+Klatt, D.H. 1972. Acoustic Theory of Terminal Analog Speech Synthesis.
+Proceedings of the 1972 International Conference on Speech Communica-
+tion and Processing, 131-5. New York: IEEE. IEEE Catalog Number 72
+CHO 596-7 AE.
+
+Klatt, D.H. 1973. Interaction between Two Factors that Influence Vowel Dura-
+tion. Journal of the Acoustical Society of America 54: 1102-4.
+
+Klatt, D.H. 1974. Review of Speech Synthesis. Journal of the Acoustical Society
+of America 55: 900. J.L. Flanagan and L.R. Rabiner (eds.).
+
+Klatt, D.H. 1975. Vowel Lengthening is Syntactically Determined in a Connected
+Discourse. Journal of Phonetics 3: 129-40.
+
+Klatt, D.H. 1976a. Stucture of a Phonological Rule Component for a Synthesis-
+by-Rule Program. IEEE Transactions on Acoustics, Speech, and Signal
+Processing ASSP-24: 391-8.
+
+Klatt, D.H. 1976b. The Linguistic Uses of Segmental Duration in English: Acous-
+tic and Perceptual Evidence. Journal of the Acoustical Society of America
+59:1208-21.
+
+Klatt, D.H. 1976¢c. A Speech Synthesis-by-Rule Program for Response Generation
+and for Word Verification. In W.A. Woods (ed.), Speech Understanding
+Systems Final Report 3438, Volume 2, 40-57. Bolt, Beranek and Newman
+Incorporated, Cambridge, Massachusetts.
+
+Klatt, D.H. 1979a. Synthesis by Rule of Consonant-Vowel Syllables. In Speech
+Communication Group Working Papers, Cambridge, Massachusetts: Massa-
+chusetts Institute of Technology.
+
+Klatt, D.H. 1979b. Synthesis by Rule of Segmental Durations in English Sen-
+
+210
diff --git a/pages-txt/223.txt b/pages-txt/223.txt
new file mode 100644
index 0000000..39059a1
--- /dev/null
+++ b/pages-txt/223.txt
@@ -0,0 +1,72 @@
+References
+
+tences. In B. Lindblom and S. Ohman (eds.), Frontiers of Speech Com-
+munication Research, New York: Academic Press.
+
+Klatt, D.H. 1980. Software for a Cascade/Parallel Formant Synthesizer. Journal
+of the Acoustical Society of America 67: 971-95.
+
+Klatt, D.H., C. Cook, and W.A. Woods 1975. PCOMPILER -- A Language for
+Stating Phonological and Phonetic Rules. Report 3080. Bolt, Beranek and
+Newman Incorporated, Cambridge, Massachusetts.
+
+Kryter, K.D. 1962. Methods for the Calculation and Use of the Articulation Index.
+Journal of the Acoustical Society of America 34: 1689-97.
+
+Kucera, H. and W.N. Francis 1967. Computational Analysis of Present-Day
+American English. Providence, Rhode Island:Brown University Press.
+Kurzweil, R. 1976. The Kurzweil Reading Machine: A Technical Overview. In
+M.R. Redden and W. Schwandt (eds.), Science, Technology and the
+Handicapped, Report 76-R-11, 3-11. American Association for the Ad-
+
+vancement of Science.
+
+Lawrence, W. 1953. The Synthesis of Speech from Signals Which Have a Low
+Information Rate. In W. Jackson (ed.), Communication Theory, London:
+Butterworth’s Scientific Publications.
+
+Lehiste, I. 1977. Isochrony Reconsidered. Journal of Phonetics 5: 253-63.
+
+Lehiste, L., J.P. Olive, and L.A. Streeter 1976. The Role of Duration in Dis-
+ambiguating Syntactically Ambiguous Sentences. Journal of the Acoustical
+Society of America 60: 1199-202.
+
+Lehiste, I. 1975a. Some Factors Affecting the Duration of Syllabic Nuclei in
+English. In G. Drachman (ed.), Proceedings of the First Salzburg Con-
+ference on Linguistics, Verlag Gunter Narr.
+
+Lehiste, I. 1975b. = The Phonetic Structure of Paragraphs. In A. Cohen and
+S. Nooteboom (eds.), Structure and Process in Speech Perception, Heidel-
+berg: Springer-Verlag.
+
+Liberman, M.Y. 1977. Further Work on Duration Modeling in Reiterant Speech.
+Journal of the Acoustical Society of America 62, Supplement 1: S48.
+(Abstract).
+
+Liberman, M.Y. 1979. Phonemic Transcription, Stress, and Segment Durations for
+Spelled Proper Names. Journal of the Acoustical Society of America 64,
+Supplement 1: S163. (Abstract).
+
+Liberman, A., F. Ingeman, L. Lisker, P. Delattre, and F. Cooper 1959. Minimal
+Rules for Synthesizing Speech. Journal of the Acoustical Society of
+America 31: 1490-9.
+
+Liljencrants, J. 1968. The OVE-III Speech Synthesizer. IEEE Transactions on
+Audio and Electroacoustics AU-16: 137-40.
+
+Liljencrants, J. 1969. Speech Synthesizer Control by Smoothed Step Functions.
+In Speech Transmission Laboratory Quarterly Progress and Status Report 4,
+43-50. Royal Institute of Technology, Stockholm.
+
+Lindblom, B. and K. Rapp 1973. Some Temporal Regularities of Spoken Swedish.
+Publication 21. Institute of Linguistics, University of Stockholm, Stock-
+holm.
+
+Lovins, J.B. and O. Fujimura 1976. Synthesis of English Monosyllables by
+Demisyllable Concatenation. Journal of the Acoustical Society of America
+60, Supplement 1: S75. (Abstract).
+
+Macchi, M. and G. Nigro 1977. Syllable Affixes in Speech Synthesis. Journal of
+the Acoustical Society of America 61, Supplement 1: S67. (Abstract).
+
+211
diff --git a/pages-txt/224.txt b/pages-txt/224.txt
new file mode 100644
index 0000000..62b4e49
--- /dev/null
+++ b/pages-txt/224.txt
@@ -0,0 +1,70 @@
+References
+
+Maeda, S. 1974. A Characterization of Fundamental Frequency Contours of
+Speech. In Research Laboratory of Electronics Quarterly Progress Report
+114, 193-211. Massachusetts Institute of Technology, Cambridge, Massa-
+chusetts.
+
+Makhoul, J. 1975. Spectral Linear Prediction: Properties and Applications. /IEEE
+Transactions on Acoustics, Speech, and Signal Processing ASSP-23:
+283-96.
+
+Markel, J.D. and A.H. Gray 1976. Linear Prediction of Speech. New
+York:Springer-Verlag.
+
+Marslen-Wilson, W.D. and A. Welsh 1978. Processing Interactions and Lexical
+Access During Word Recognition in Continuous Speech. Cognitive
+Psychology 10: 29-63.
+
+Mattingly, I. 1966. Synthesis by Rule of Prosodic Features. Language and Speech
+9:1-13.
+
+Mattingly, I. 1968a. Synthesis-by-Rule of General American English. In Supple-
+ment to Status Report on Speech Research. Haskins Laboratories, New
+Haven, Connecticut.
+
+Mattingly, I. 1968b. Experimental Methods for Speech Synthesis by Rule. IEEE
+Transactions on Audio and Electroacoustics AU-16: 198-202.
+
+Miller, G.A., G. Heise, and W. Lichten 1951. The Intelligibility of Speech as a
+Function of the Context of the Test Materials. Journal of Experimental
+Psychology 41: 329-35.
+
+Miller, G.A. and S. Isard 1963. Some Perceptual Consequences of Linguistic
+Rules. Journal of Verbal Learning and Verbal Behavior 2: 217-28.
+
+Miranker, G.S. 1978. A Digital Signal Processor for Real Time Generation of
+Speech Waveforms. Proceedings of the Fifth Annual Symposium on Com-
+puter Architecture. New York: IEEE. |
+
+Morris, L.R. 1979. A Fast Fortran Implementation of the NRL Algorithm for
+Automatic Translation of English Text to Votrax Parameters. Proceedings
+of the International Conference on Acoustics, Speech, and Signal
+Processing, 907-13. New York: IEEE.
+
+Nakata, K. and T. Mitsuoka 1965. Phonemic Transformation and Control Aspects
+of Synthesis of Connected Speech. Journal of the Radio Research
+Laboratories 12: 171-86. Tokyo.
+
+Nye, P., J. Hankins, T. Rand, I. Mattingly, and F. Cooper 1973. A Plan for the
+Field Evaluation of an Automated Reading System for the Blind. /EEE
+Transactions on Audio and Electroacoustics AU-21: 265-8.
+
+Nye, P.W,, F. Ingeman, and L. Donald 1975. Synthetic Speech Comprehension:
+A Comparison of Listener Performances with and Preferences Among Dif-
+ferent Speech Forms. In Status Report on Speech Research SR-41, 117-26.
+Haskins Laboratories, New Haven, Connecticut.
+
+Nye, P.W. and J. Gaitenby 1973. Consonant Intelligibility in Synthetic Speech and
+in a Natural Speech Control (Modified Rhyme Test Results). In Status
+Report on Speech Research SR-33, 77-91. Haskins Laboratories, New
+Haven, Connecticut.
+
+Nye, P.W. and J. Gaitenby 1974. The Intelligibility of Synthetic Monosyllable
+Words in Short, Syntactically Normal Sentences. In Status Report on
+Speech Research SR-37/38, 169-90. Haskins Laboratories, New Haven,
+Connecticut.
+
+O’Shaughnessy, D. 1976. Modelling Fundamental Frequency, and its Relationship
+
+212
diff --git a/pages-txt/225.txt b/pages-txt/225.txt
new file mode 100644
index 0000000..f1d836e
--- /dev/null
+++ b/pages-txt/225.txt
@@ -0,0 +1,77 @@
+References
+
+to Syntax, Semantics, and Phonetics. PhD thesis, Massachusetts Institute of
+
+Technology, Cambridge, Massachusetts.
+
+O’Shaughnessy, D. 1977. Fundamental frequency by Rule for a Text-to-Speech
+System. Proceedings of the International Conference on Acoustics, Speech,
+and Signal Processing, 571-4. New York: IEEE.
+
+Olive, J.P. 1974. Speech Synthesis by Rule. In G. Fant (ed.), Speech communica-
+tion: Volume 2, New York: Halsted Press.
+
+Olive, J.P. 1977. Rule synthesis of speech from Diadic Units. Proceedings of the
+International Conference on Acoustics, Speech, and Signal Processing,
+568-70. New York: IEEE. IEEE Catalog No. 77CH1197-3 ASSP.
+
+Olive, J.P. 1979. Speech Synthesis from Phonemic Transcription. Journal of the
+Acoustical Society of America 64, Supplement 1: S163. (Abstract).
+
+Olive, J.P. and L.H. Nakatani 1974. Rule Synthesis of Speech by Word Con-
+catenation: A First Step. Journal of the Acoustical Society of America 535:
+660-6.
+
+Olive, J.P. and N. Spickenagle 1976. Speech Resynthesis from Phoneme-Related
+Parameters. Journal of the Acoustical Society of America 59: 993-6.
+
+Oller, D.K. 1973. The Effect of Position in Utterance on Speech Segment Dura-
+tion in English. Journal of the Acoustical Society of America 54: 1235-47.
+
+Peterson, G., W. Wang, and E. Sivertsen 1958. Segmentation Techniques in
+Speech Synthesis. Journal of the Acoustical Society of America 30: 739-42.
+
+Peterson, G.E. and 1. Lehiste 1960. Duration of Syllabic Nuclei in English.
+Journal of the Acoustical Society of America 32: 693-703.
+
+Pierrehumbert, J. 1979. Intonation Synthesis Based on Metrical Grids. Paper
+presented at the 97th Meeting of the Acoustical Society of America. New
+York: The Acoustical Society of America. ASA Preprint.
+
+Pisoni, D.B. 1978. Speech Perception. In W.K. Estes (ed.), Handbook of Learn-
+ing and Cognitive Processes (Volume 6), Hillsdale, New Jersey: Lawrence
+Erlbaum Associates.
+
+Rabiner, L.R. 1968a. Speech Synthesis by Rule: An Acoustic Domain Approach.
+Bell System Technical Journal 47: 17-38.
+
+Rabiner, L.R. 1968b. Digital-Formant Synthesizer for Speech-Synthesis Studies.
+Journal of the Acoustical Society of America 43: 822-8.
+
+Rabiner, LR., R.W. Schafer, and J.L. Flanagan 1971. Computer Synthesis of
+Speech by Concatenation of Formant-Coded Words. Bell System Technical
+
+Journal 50: 1541-58.
+Rabiner, L.R., L.B. Jackson, R.W. Schafer, and C.H. Coker 1971a. A Hardware
+
+Realization of a Digital Formant Speech Synthesizer. IEEE Transactions on
+Communication Technology COM-19: 1016-70.
+
+Rabiner, L.R., and R.W. Schafer 1976. Digital Techniques for Computer Voice
+Response Implementations and Applications. Proceedings of the IEEE 64:
+416-33.
+
+Rosen, G. 1958. A Dynamic Analog Speech Synthesizer. Journal of the Acous-
+
+tical Society of America 34: 201-9.
+Rothenberg, M., R. Carlson, B. Granstrom, and J. Gauffin 1974. A Three-
+
+Parameter Voice Source for Speech Synthesis. In G. Fant (ed.), Speech
+
+Communication, Uppsala, Sweden: Almqvist and Wikell.
+Schwartz, R. et al. 1979. Diphone Synthesis for Phonemic Vocoding.
+
+Proceedings of the International Conference on Acoustics, Speech, and Sig-
+nal Processing, 891-4. New York: IEEE.
+
+213
diff --git a/pages-txt/226.txt b/pages-txt/226.txt
new file mode 100644
index 0000000..2de2b5a
--- /dev/null
+++ b/pages-txt/226.txt
@@ -0,0 +1,61 @@
+References g
+
+Scott, R.J., D.M. Glace, and I.G. Mattingly 1966. A Computer-Controlled On-
+Line Speech Synthesizer System. 1966 IEEE International Communications
+Conference, Digest of Technical Papers, 104-5. New York: IEEE.
+
+Stevens, K.N. 1956. Synthesis of Speech by Electrical Analog Devices. Journal
+of the Audio Engineering Society 4: 2-8.
+
+Stevens, K.N. 1971. Airflow and Turbulence Noise for Fricative and Stop Con-
+sonants: - Static Considerations. Journal of the Acoustical Society of
+America 50: 1180-92.
+
+Stevens, K.N. 1972. The Quantal Nature of Speech: Evidence from Articulatory-
+Acoustic Data. In E.E. David and P.B. Denes (eds.), Human Communica-
+tion: A Unified View, New York: McGraw-Hill.
+
+Stevens, K.N., S. Kasowski, and G. Fant 1953. An Electrical Analog of the Vocal
+Tract. Journal of the Acoustical Society of America 25: 734-42.
+
+Stevens, K.N., R.P. Bastide, and C.P. Smith 1955. Electrical Synthesizer of Con-
+tinuous Speech. Journal of the Acoustical Society of America 27: 207.
+(Abstract).
+
+Stevens, K.N. and D.H. Klatt 1974. Current Models of Sound Sources for Speech.
+In B.D. Wyke (ed.), Ventilatory and Phonatory Control Systems: An Inter-
+national Symposium, London: Oxford University Press.
+
+Tomlinson, R.S. 1966. SPASS - An Improved Terminal Analog Speech Syn-
+thesizer. In Research Laboratory of Electronics Quarterly Progress Report
+80, 198-205. Massachusetts Institute of Technology, Cambridge, Massachu-
+setts.
+
+Umeda, N. 1975. Vowel Duration in American English. Journal of the Acoustical
+Society of America 58: 434-45.
+
+Umeda, N. 1976. Linguistic Rules for Text-to-Speech Synthesis. Proceedings of
+the IEEE 64:443-51.
+
+Umeda, N. 1977. Consonant Duration in American English. Journal of the Acous-
+
+. tical Society of America 61: 846-58.
+
+Wang, W. and G.E. Peterson 1958. Segment Inventory for Speech Synthesis.
+Journal of the Acoustical Society of America 30: 743-6.
+
+Wiggins, R. 1979. The TMC 0280 Speech Synthesizer. Journal of the Acoustical
+Society of America 64, Supplement 1: S72. (Abstract).
+
+Woods, W. 1970. Transition Network Grammars for Natural Language Analysis.
+Communications of the ACM 13: 591-606.
+
+Woods, W. et al. 1976. The BBN-HWIM Speech Understanding System Final
+Report. Report 3438, Volume V. Bolt, Beranek and Newman, Cambridge,
+Massachusetts. 47-58.
+
+Young, S.J. and F. Fallside 1979. Speech Synthesis from Concept: A Method for
+Speech Output from Information Systems. Journal of the Acoustical Society
+of America 66: 685-95.
+
+214
diff --git a/pages-txt/227.txt b/pages-txt/227.txt
new file mode 100644
index 0000000..43a8a49
--- /dev/null
+++ b/pages-txt/227.txt
@@ -0,0 +1,46 @@
+Index
+
+Allophone 81 Nasal murmurs 120, 142, 147
+Antiresonance 129 Noun group 40
+Articulatory synthesis 78, 124 grammar 49
+Bitrate 9 Parameter
+conversion 13
+Cascade/parallel synthesizer 126 update rate 128
+Continuation rise 13, 75 | Parametric representation 9
+' Parsing 40
+Declination line 13, 100, 105 strategy 45
+Delta modulation 9 Part-of-speech processor 43, 177
+Determiner 43 Parts of speech 41, 101
+Diphone 77 Pauses 88
+Phoneme 73,78, 81
+Formant Phonetic
+synthesis 124 targets 108
+vocoding 75 transcription 12, 23
+Frequency domain 9 Phonological recoding 13
+Frication noise 131 Predeterminer 43
+Function words 83, 103 Prosodic
+Fundamental frequency contour 100 parameters 10
+of clause 103
+of word 103 Radiation characteristic 124, 136, 150
+Reading machine 71, 151
+Glottal waveform 130, 133 Resonator
+digital 128
+Harvard Psychoacoustic Sentences 157, low-pass 130
+203 Roots, morph 24
+Haskins Anomalous Sentences 157,204
+Homograph 41,55 Sinusoidal voicing 134
+Software
+Kaiser window 110 simulation 124
+synthesizer 113, 123
+Lexicon 23 Speak-’N-Spell 75, 108
+Linear prediction 75,110 Spectral matching 116
+Linear prediction coding 75 Syllables 24, 73,76
+Linear predictive coding 10 Synthesis-by-rule 10, 81
+Locus theory 78, 112 Synthesizer 13,78, 123
+Modified Rhyme Test 152 Text-to-speech 11,79
+Morph 24
+covering 27 Verb group 40
+Morpheme 24 grammar 48
+Verbal 41
+
+215
diff --git a/pages-txt/228.txt b/pages-txt/228.txt
new file mode 100644
index 0000000..db2f0eb
--- /dev/null
+++ b/pages-txt/228.txt
@@ -0,0 +1,14 @@
+Index
+
+Vocal
+
+apparatus 8
+
+tract 78, 124
+Vocal tract model 139, 144, 173
+Voice response 72
+
+Waveform
+coding 9
+discontinuities 138
+sampling rate 127
diff --git a/pages-txt/229.txt b/pages-txt/229.txt
new file mode 100644
index 0000000..43ce290
--- /dev/null
+++ b/pages-txt/229.txt
@@ -0,0 +1,36 @@
+Cambridge Studies
+
+in Speech Science
+and Communication
+
+This book describes the most comprehensive system yet developed for the
+automatic conversion of unrestricted English text to intelligibie and natural
+sounding synthetic speech. It offers detailed accounts of the various
+components any speech technologist needs to consider — text preprocessing,
+morphological analysis, letter-to-sound rules, syntactic analysis, routines for
+morphophonemics, stress adjustment, timing and pitch, together with
+segmental synthesis.
+
+Work on the MIT text-to-speech system began in the 1960s. By the late 1970s
+experience with the interaction of all the constituent algorithms had reached
+the point where it was possible to provide a detailed exposition of the system.
+The present volume builds on an intensive lecture course on the system held
+in 1979 and brings it up to date with a full account of the developments that
+have since taken place. In particular the system'’s software has been developed
+and can now, for example, easily permit the assembly of subsets of the overall
+system. All the examples in From text to speech are a direct result of the
+current working system, and the book includes extensive and explicit
+representations of the algorithms and rules used in the system.
+
+The MIT text-to-speech system has set new standards for intelligibility,
+linguistic sophistication and methods of evaluation. It provides an impressive
+statement of our knowledge to date about speech synthesis techniques and
+will be an invaluable resource not only for professionals in the field, but for any
+reader with an informed interest in natural language processing.
+
+Cover design by James Butler
+
+CAMBRIDGE UNIVERSITY PRESS
+
+GO 0187
+