|
|
<!DOCTYPE html>
|
|
|
<html lang="en">
|
|
|
<head>
|
|
|
<meta charset="UTF-8">
|
|
|
<title> | tait.tech</title>
|
|
|
<link rel="stylesheet" href="/assets/css/style.css" id="main-css">
|
|
|
<link rel="stylesheet" href="/assets/css/transcription.css" id="trans-css">
|
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
|
<script src="/assets/js/"></script>
|
|
|
|
|
|
<link rel="stylesheet" href="/assets/css/katex.css" id="math-css">
|
|
|
|
|
|
|
|
|
</head>
|
|
|
<body>
|
|
|
<main>
|
|
|
<div id="wrapper">
|
|
|
<h1 id="cmpt-295">CMPT 295</h1>
|
|
|
|
|
|
<ul>
|
|
|
<li>Unit - Data Representation
|
|
|
<ul>
|
|
|
<li>Lecture 6 – Representing fractional numbers in memory</li>
|
|
|
<li>IEEE floating point representation – cont’d</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="have-you-heard-of-that-new-band-1023-megabytes">Have you heard of that new band “1023 Megabytes”?</h2>
|
|
|
|
|
|
<p>They’re pretty good,
|
|
|
but they don’t have a gig just yet.
|
|
|
😭</p>
|
|
|
|
|
|
<h2 id="last-lecture">Last Lecture</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>Representing integral numbers in memory
|
|
|
<ul>
|
|
|
<li>Can encode a small range of values exactly (in 1, 2, 4, 8 bytes)
|
|
|
<ul>
|
|
|
<li>For example: We can represent the values -128 to 127 exactly in 1 byte using a signed char in C</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Representing fractional numbers in memory
|
|
|
<ol>
|
|
|
<li>Positional notation has some advantages, but also disadvantages -> so not used!</li>
|
|
|
<li>IEEE floating point representation: can encode a much larger range of e.g., single precision: [10-38..1038] values approximately (in 4 or 8 bytes)</li>
|
|
|
</ol>
|
|
|
</li>
|
|
|
<li>Overview of IEEE floating point representation
|
|
|
<ul>
|
|
|
<li>Precision options (float 32-bit, double 64-bit)</li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>V</mi><mo>=</mo><msup><mtext>(-1)</mtext><mi>s</mi></msup><mo>×</mo><mi>M</mi><mo>×</mo><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = \text{(-1)}^{s} \times M \times 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.054292em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord text"><span class="mord">(-1)</span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.804292em;"><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8913309999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
<li>s –> sign bit</li>
|
|
|
<li>exp encodes E (but != E)</li>
|
|
|
<li>frac encodes M (but != M)</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>We interpret the bit vector (expressed in IEEE floating point encoding) stored in memory using this equation.</p>
|
|
|
|
|
|
<h2 id="todays-menu">Today’s Menu</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>Representing data in memory – Most of this is review
|
|
|
<ul>
|
|
|
<li>“Under the Hood” - Von Neumann architecture</li>
|
|
|
<li>Bits and bytes in memory
|
|
|
<ul>
|
|
|
<li>How to diagram memory -> Used in this course and other references</li>
|
|
|
<li>How to represent series of bits -> In binary, in hexadecimal (conversion)</li>
|
|
|
<li>What kind of information (data) do series of bits represent -> Encoding scheme</li>
|
|
|
<li>Order of bytes in memory -> Endian</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Bit manipulation – bitwise operations
|
|
|
<ul>
|
|
|
<li>Boolean algebra + Shifting</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Representing integral numbers in memory
|
|
|
<ul>
|
|
|
<li>Unsigned and signed</li>
|
|
|
<li>Converting, expanding and truncating</li>
|
|
|
<li>Arithmetic operations</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Representing real numbers in memory
|
|
|
4
|
|
|
<ul>
|
|
|
<li>IEEE floating point representation</li>
|
|
|
<li>Floating point in C – casting, rounding, addition, …</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation-three-kinds-of-values">IEEE Floating Point Representation Three “kinds” of values</h2>
|
|
|
|
|
|
<p>We interpret the bit vector
|
|
|
(expressed in IEEE floating point encoding) stored in memory using this equation:</p>
|
|
|
|
|
|
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">
|
|
|
V = (-1)^{s} M 2^{E}
|
|
|
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">−</span><span class="mord">1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7143919999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<p>Bit breakdown–<code class="language-plaintext highlighter-rouge">exp</code> and <code class="language-plaintext highlighter-rouge">frac</code> interpreted as unsigned:</p>
|
|
|
|
|
|
<ul>
|
|
|
<li>s = 1 bit</li>
|
|
|
<li>exp = k bits
|
|
|
<ol>
|
|
|
<li>If <code class="language-plaintext highlighter-rouge">exp</code> != 0 and <code class="language-plaintext highlighter-rouge">exp</code> != 11…11 (exp range: [0000001…11111110]). Equations:
|
|
|
<ul>
|
|
|
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>=</mo><mtext>exp</mtext><mo>−</mo><mtext>bias</mtext></mrow><annotation encoding="application/x-tex">E = \text{exp} - \text{bias}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.7777700000000001em;vertical-align:-0.19444em;"></span><span class="mord text"><span class="mord">exp</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.9324379999999999em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span></li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>M</mi><mo>=</mo><mn>1</mn><mo>+</mo><mtext>frac</mtext></mrow><annotation encoding="application/x-tex">M = 1 + \text{frac}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">frac</span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>If <code class="language-plaintext highlighter-rouge">exp</code> = 00…00 (all 0’s) => denormalized. Equations:
|
|
|
<ul>
|
|
|
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>=</mo><mn>1</mn><mo>−</mo><mtext>bias</mtext></mrow><annotation encoding="application/x-tex">E = 1 - \text{bias}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.9324379999999999em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span></li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>M</mi><mo>=</mo><mtext>frac</mtext></mrow><annotation encoding="application/x-tex">M = \text{frac}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">frac</span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>If <code class="language-plaintext highlighter-rouge">exp</code> 11…11 (all 1’s) => special cases.
|
|
|
<ul>
|
|
|
<li>Case 1: <code class="language-plaintext highlighter-rouge">frac</code> = 000…0</li>
|
|
|
<li>Case 2: <code class="language-plaintext highlighter-rouge">frac</code> != 000…0</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ol>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation---normalized">IEEE floating point representation - normalized</h2>
|
|
|
|
|
|
<p>Numerical Form: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<p>Bit breakdown:</p>
|
|
|
|
|
|
<ul>
|
|
|
<li>s = 1 bit</li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">exp</code> = k bits
|
|
|
<ul>
|
|
|
<li>If <code class="language-plaintext highlighter-rouge">exp</code> != 0 and <code class="language-plaintext highlighter-rouge">exp</code> != 11…11 (<code class="language-plaintext highlighter-rouge">exp</code> range: [00000001…11111110]) => normalized. Equations:
|
|
|
<ul>
|
|
|
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>=</mo><mtext>exp</mtext><mo>−</mo><mtext>bias</mtext></mrow><annotation encoding="application/x-tex">E = \text{exp} - \text{bias}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.7777700000000001em;vertical-align:-0.19444em;"></span><span class="mord text"><span class="mord">exp</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.9324379999999999em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span></li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>M</mi><mo>=</mo><mn>1</mn><mo>+</mo><mtext>frac</mtext></mrow><annotation encoding="application/x-tex">M = 1 + \text{frac}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">frac</span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Why is <code class="language-plaintext highlighter-rouge">E</code> biased?</p>
|
|
|
|
|
|
<p>Using single precision as an example (s = 1 bit, exp = 8 bits, frac = 23 bits):</p>
|
|
|
|
|
|
<ul>
|
|
|
<li>(exp range: [00000001 .. 11111110]) => <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">[</mo><msub><mn>1</mn><mn>10</mn></msub><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mn>25</mn><msub><mn>4</mn><mn>10</mn></msub><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">[1_{10}...254_{10}]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">10</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">...25</span><span class="mord"><span class="mord">4</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">10</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">]</span></span></span></span></li>
|
|
|
<li>If <code class="language-plaintext highlighter-rouge">E</code> is not biased (i.e. E = exp), then <code class="language-plaintext highlighter-rouge">E</code> range <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">[</mo><msub><mn>1</mn><mn>10</mn></msub><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mn>25</mn><msub><mn>4</mn><mn>10</mn></msub><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">[1_{10} ... 254_{10}]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">10</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord">...25</span><span class="mord"><span class="mord">4</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">10</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">]</span></span></span></span></li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">V</code> range [<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>1</mn></msup></mrow><annotation encoding="application/x-tex">2^{1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span></span>…<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>254</mn></msup></mrow><annotation encoding="application/x-tex">2^{254}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">254</span></span></span></span></span></span></span></span></span></span></span></span>] = [2…<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>≈</mo><mn>2.89</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>76</mn></msup></mrow><annotation encoding="application/x-tex">\approx 2.89 \times 10^{76}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.48312em;vertical-align:0em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">2.89</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">76</span></span></span></span></span></span></span></span></span></span></span></span>] (so cannot express numbers < 2)</li>
|
|
|
<li>By biasing <code class="language-plaintext highlighter-rouge">E</code> (i.e. E = exp - bias), then <code class="language-plaintext highlighter-rouge">E</code> range: [1-127…254-127] == [-126…127] (since k = 8, bias = <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mrow><mn>8</mn><mo>−</mo><mn>1</mn></mrow></msup><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">2^{8-1} - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span> = 127)</li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">V</code> range: [<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>126</mn></mrow></msup></mrow><annotation encoding="application/x-tex">2^{-126}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">126</span></span></span></span></span></span></span></span></span></span></span></span>…<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mn>127</mn></msup></mrow><annotation encoding="application/x-tex">2^{127}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">127</span></span></span></span></span></span></span></span></span></span></span></span>] = [<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>≈</mo><mn>1.18</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mrow><mo>−</mo><mn>38</mn></mrow></msup></mrow><annotation encoding="application/x-tex">\approx 1.18 \times 10^{-38}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.48312em;vertical-align:0em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1.18</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">38</span></span></span></span></span></span></span></span></span></span></span></span> … <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>≈</mo><mn>1.7</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>38</mn></msup></mrow><annotation encoding="application/x-tex">\approx 1.7 \times 10^{38}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.48312em;vertical-align:0em;"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1.7</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">38</span></span></span></span></span></span></span></span></span></span></span></span> (so can now express very small (and very large) numbers)</li>
|
|
|
<li>Why adding 1 to <code class="language-plaintext highlighter-rouge">frac</code>? Because the number (or value) V is first normalized before it is converted.</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="review-scientific-notation-and-normalization">Review: Scientific Notation and normalization</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>From Wikipedia:
|
|
|
<ul>
|
|
|
<li>Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form (as they are long strings of digits).</li>
|
|
|
<li>In scientific notation, nonzero numbers are written in the form +/- M × 10n</li>
|
|
|
<li>In normalized notation, the exponent n is chosen such that the absolute value of the significand M is at least 1 (M = 1.0) but less than the base
|
|
|
<ul>
|
|
|
<li>M range for base 10 => [1.0 .. 10.0 – ε ]</li>
|
|
|
<li>M range for base 2 => [1.0 .. 2.0 – ε ]</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>
|
|
|
<p>Examples:</p>
|
|
|
|
|
|
<ul>
|
|
|
<li>A proton’s mass is 0.0000000000000000000000000016726 kg -> <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1.6726</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mrow><mtext>−</mtext><mn>27</mn></mrow></msup></mrow><annotation encoding="application/x-tex">1.6726 \times 10^{−27}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1.6726</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−27</span></span></span></span></span></span></span></span></span></span></span></span> kg</li>
|
|
|
<li>Speed of light is 299,792,458 m/s -> <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2.99792</mn><mo separator="true">,</mo><mn>458</mn><mo>×</mo><mn>1</mn><msup><mn>0</mn><mn>8</mn></msup></mrow><annotation encoding="application/x-tex">2.99792,458 \times 10^{8}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8388800000000001em;vertical-align:-0.19444em;"></span><span class="mord">2.99792</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">458</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord">1</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span></span></span></span></span></span></span></span></span></span></span></span> m/s</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Syntax of normalized notation:</p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
<th>Name</th>
|
|
|
<th>Notation</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>Sign</td>
|
|
|
<td>+/-</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>Significant</td>
|
|
|
<td><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mn>0</mn></msub><mo separator="true">,</mo><msub><mi>d</mi><mrow><mo>−</mo><mn>1</mn></mrow></msub><mo separator="true">,</mo><msub><mi>d</mi><mrow><mo>−</mo><mn>2</mn></mrow></msub><mo separator="true">,</mo><msub><mi>d</mi><mrow><mo>−</mo><mn>3</mn></mrow></msub></mrow><annotation encoding="application/x-tex">d_{0}, d_{-1}, d_{-2}, d_{-3}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.902771em;vertical-align:-0.208331em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight">3</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span></span></span>…<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mo>−</mo><mi>n</mi></mrow></msub></mrow><annotation encoding="application/x-tex">d_{-n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.902771em;vertical-align:-0.208331em;"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.25833100000000003em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mathnormal mtight">n</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span></span></span></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>Base</td>
|
|
|
<td><code class="language-plaintext highlighter-rouge">b</code></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>Exponent</td>
|
|
|
<td><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mrow></mrow><mtext>exp</mtext></msup></mrow><annotation encoding="application/x-tex">^{\text{exp}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.664392em;vertical-align:0em;"></span><span class="mord"><span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">exp</span></span></span></span></span></span></span></span></span></span></span></span></span></td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
<ul>
|
|
|
<li>Let’s try: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>101011010.10</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">101011010.101_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">101011010.10</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span> -> ___</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="lets-try-normalizing-these-fractional-binary-numbers">Let’s try normalizing these fractional binary numbers!</h2>
|
|
|
|
|
|
<ol>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>101011010.10</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">101011010.101_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">101011010.10</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>0.00000000110</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">0.000000001101_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">0.00000000110</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>1100000011100</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">11000000111001_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">1100000011100</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ol>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation">IEEE floating point representation</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>Once V is normalized, we apply the equations
|
|
|
<ul>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup><mo>=</mo><mn>1.0101101010</mn><msub><mn>1</mn><mn>2</mn></msub><mo>×</mo><msup><mn>2</mn><mn>8</mn></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E} = 1.01011010101_{2} \times 2^{8}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7143919999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">1.0101101010</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8641079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641079999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span></span></span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">s</code> = ???</li>
|
|
|
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>=</mo><mtext>exp</mtext><mo>−</mo><mtext>bias</mtext></mrow><annotation encoding="application/x-tex">E = \text{exp} - \text{bias}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">E</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.7777700000000001em;vertical-align:-0.19444em;"></span><span class="mord text"><span class="mord">exp</span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span></span></span></span> where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mo>−</mo><mn>1</mn><mo>=</mo><msup><mn>2</mn><mn>7</mn></msup><mo>−</mo><mn>1</mn><mo>=</mo><mn>128</mn><mo>−</mo><mn>1</mn><mo>=</mo><mn>127</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} - 1 = 2^{7} - 1 = 128 - 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.9324379999999999em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">128</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">127</span></span></span></span></li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">exp</code> = <code class="language-plaintext highlighter-rouge">E</code> + <code class="language-plaintext highlighter-rouge">bias</code> = ___</li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">M</code> = 1 + <code class="language-plaintext highlighter-rouge">frac</code> = ___</li>
|
|
|
<li><code class="language-plaintext highlighter-rouge">s</code> = 1 bit, <code class="language-plaintext highlighter-rouge">exp</code> = k bits => 8 bits, <code class="language-plaintext highlighter-rouge">frac</code> n bits => 23 bits</li>
|
|
|
<li>bit vector in memory:</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="why-adding-1-to-frac-or-subtracting-1-from-m">Why adding 1 to frac (or subtracting 1 from M)?</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>Because the number (or value) V is first normalized before it is converted.
|
|
|
<ul>
|
|
|
<li>As part of this normalization process, we transform our binary number such that its significand M is within the range [1.0 .. 2.0 – ε ]</li>
|
|
|
<li>Remember: M range for base 2 => [1.0 … 2.0 – ε]</li>
|
|
|
<li>This implies that M is always at least 1.0, so its integral part always has the value 1</li>
|
|
|
<li>So since this bit is always part of M, IEEE 754 does not explicitly save it in its bit pattern (i.e., in memory)</li>
|
|
|
<li>Instead, this bit is implied!</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="why-adding-1-to-frac-or-subtracting-1-from-m-1">Why adding 1 to frac (or subtracting 1 from M)?</h2>
|
|
|
|
|
|
<p>Implying this bit has the following effects:</p>
|
|
|
|
|
|
<p>We get the
|
|
|
leading bit
|
|
|
for free!</p>
|
|
|
|
|
|
<ol>
|
|
|
<li>We save 1 bit when we convert (represent) a fractional decimal number into a bit pattern using IEEE 754 floating point representation</li>
|
|
|
<li>We have to add this 1 bit back when we convert from a bit pattern (IEEE 754 floating point representation) back to a fractional decimal</li>
|
|
|
</ol>
|
|
|
|
|
|
<p>Example: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup><mo>=</mo><mn>1.01011010101</mn><mo>×</mo><msup><mn>2</mn><mn>8</mn></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E} = 1.01011010101 \times 2^{8}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">1.01011010101</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">8</span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<p>M = 1. 01011010101 => M = 1 + frac</p>
|
|
|
|
|
|
<p>This bit is implied hence not stored in the bit pattern produced
|
|
|
by the IEEE 754 floating point representation, and what we
|
|
|
store in the frac part of the IEEE 754 bit pattern is 01011010101</p>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation-single-precision">IEEE floating point representation (single precision)</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>What if the 4 bytes starting at M[0x0000] represented a fractional
|
|
|
decimal number (encoded as an IEEE floating point number) -> value?
|
|
|
single precision</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Numerical Form: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
<th>Value</th>
|
|
|
<th>Notes</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>1</td>
|
|
|
<td> </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>10000111</td>
|
|
|
<td>k=8 bits, interpreted as unsigned</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>01011010101000000000000</td>
|
|
|
<td>n=23 bits, interpreted as unsigned</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
<ul>
|
|
|
<li>exp ≠ 0 and exp ≠ 111111112 -> normalized</li>
|
|
|
<li>s = ___</li>
|
|
|
<li>E = exp – bias where bias = <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><msup><mn>2</mn><mn>7</mn></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>128</mn><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>127</mn></mrow><annotation encoding="application/x-tex">2^{k-1} – 1 = 2^{7} – 1 = 128 – 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8491079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">128–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">127</span></span></span></span>$</li>
|
|
|
<li>E = ____ - 127 =</li>
|
|
|
<li>M = 1 + <code class="language-plaintext highlighter-rouge">frac</code> = 1 + ___</li>
|
|
|
<li>V = ____</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Little endian memory layout:</p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
<th>Address</th>
|
|
|
<th>M[]</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>size-1</td>
|
|
|
<td> </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>…</td>
|
|
|
<td>…</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0003</td>
|
|
|
<td>11000011</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0002</td>
|
|
|
<td>10101101</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0001</td>
|
|
|
<td>01010000</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0000</td>
|
|
|
<td>00000000</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
<h2 id="lets-give-it-a-go">Let’s give it a go!</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>What if the 4 bytes starting at M[0x0000] represented a fractional
|
|
|
decimal number (encoded as an IEEE floating point number) -> value?</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Numerical form: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = (-1)^{s} M 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.0913309999999998em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">−</span><span class="mord">1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.664392em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413309999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<p>single precision</p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
<th>Value</th>
|
|
|
<th>Notes</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>0</td>
|
|
|
<td> </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>10001100</td>
|
|
|
<td>k=8 bits, interpreted as unsigned</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>11011011011000000000000</td>
|
|
|
<td>n=23 bits, interpreted as unsigned</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
<ul>
|
|
|
<li>exp ≠ 0 and exp ≠ 111111112 -> normalized</li>
|
|
|
<li>s = ____</li>
|
|
|
<li>E = exp - bias where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mn>7</mn></msup><mo>−</mo><mn>1</mn><mo>=</mo><mn>128</mn><mo>−</mo><mn>1</mn><mo>=</mo><mn>127</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{7} - 1 = 128 - 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.72777em;vertical-align:-0.08333em;"></span><span class="mord">128</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">127</span></span></span></span></li>
|
|
|
<li>E = ____ - 127 = ___</li>
|
|
|
<li>M = 1 + frac = 1 + ____</li>
|
|
|
<li>V = ____</li>
|
|
|
</ul>
|
|
|
|
|
|
<p>Little endian memory map:</p>
|
|
|
|
|
|
<table>
|
|
|
<thead>
|
|
|
<tr>
|
|
|
<th>Address</th>
|
|
|
<th>M[]</th>
|
|
|
</tr>
|
|
|
</thead>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>size-1</td>
|
|
|
<td> </td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>…</td>
|
|
|
<td>…</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0003</td>
|
|
|
<td>01000110</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0002</td>
|
|
|
<td>01101101</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0001</td>
|
|
|
<td>10110000</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>0x0000</td>
|
|
|
<td>00000000</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation-single-precision-1">IEEE floating point representation (single precision)</h2>
|
|
|
|
|
|
<p>How would 47.21875 be encoded as IEEE floating point number?</p>
|
|
|
|
|
|
<ol>
|
|
|
<li>Convert 47.28 to binary (using the positional notation R2B(X)) =>
|
|
|
<ul>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>47</mn><mo>=</mo><mn>10111</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">47 = 101111_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">47</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">10111</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi mathvariant="normal">.</mi><mn>21875</mn><mo>=</mo><mi mathvariant="normal">.</mi><mn>0011</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">.21875 = .00111_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">.21875</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">.0011</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Normalize binary number:
|
|
|
101111.00111 => <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1.011110011</mn><msub><mn>1</mn><mn>2</mn></msub><mo>×</mo><msup><mn>2</mn><mn>5</mn></msup></mrow><annotation encoding="application/x-tex">1.0111100111_{2} \times 2^{5}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">1.011110011</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">5</span></span></span></span></span></span></span></span></span></span></span></span>
|
|
|
<ul>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7143919999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></span>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Determine …
|
|
|
<ul>
|
|
|
<li>s = 0</li>
|
|
|
<li>E = exp – bias where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><msup><mn>2</mn><mn>7</mn></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>128</mn><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>127</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} – 1 = 2^{7} – 1 = 128 – 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8491079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">128–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">127</span></span></span></span></li>
|
|
|
<li>exp = E + bias = 5 + 127 = 132 => U2B(132) => 10000100</li>
|
|
|
<li>M = 1 + frac -> frac = M - 1 => <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1.011110011</mn><msub><mn>1</mn><mn>2</mn></msub><mo>−</mo><mn>1</mn><mo>=</mo><mi mathvariant="normal">.</mi><mn>011110011</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">1.0111100111_{2} - 1 = .0111100111_{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">1.011110011</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="mord">.011110011</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>32 bits organized in s|exp|frac: [0|10000100|01111001110000000000000}</li>
|
|
|
<li>0x423CE000</li>
|
|
|
</ol>
|
|
|
|
|
|
<h2 id="ieee-floating-point-representation-single-precision-2">IEEE floating point representation (single precision)</h2>
|
|
|
|
|
|
<p>How would 12345.75 be encoded as IEEE floating point number?</p>
|
|
|
|
|
|
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">
|
|
|
V = (-1)^{s} M 2^{E}
|
|
|
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">−</span><span class="mord">1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7143919999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathdefault mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></span></p>
|
|
|
|
|
|
<ol>
|
|
|
<li>Convert 12345.75 to binary
|
|
|
<ul>
|
|
|
<li>12345 => ____ .75 => ____</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Normalize binary number:</li>
|
|
|
<li>Determine …
|
|
|
<ul>
|
|
|
<li>s = ____</li>
|
|
|
<li>E = exp – bias where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><msup><mn>2</mn><mn>7</mn></msup><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>128</mn><mtext>–</mtext><mn>1</mn><mo>=</mo><mn>127</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} – 1 = 2^{7} – 1 = 128 – 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8491079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8141079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">7</span></span></span></span></span></span></span></span></span><span class="mord">–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">128–1</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">127</span></span></span></span></li>
|
|
|
<li>exp = E + bias = ____</li>
|
|
|
<li>M = 1 + frac -> frac = M - 1</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>[_|________|_______________________]</li>
|
|
|
<li>Express in hex:</li>
|
|
|
</ol>
|
|
|
|
|
|
<h2 id="summary">Summary</h2>
|
|
|
|
|
|
<ul>
|
|
|
<li>IEEE Floating Point Representation
|
|
|
<ol>
|
|
|
<li>Denormalized</li>
|
|
|
<li>Special cases</li>
|
|
|
<li>Normalized => exp ≠ 000…0 and exp ≠ 111…1
|
|
|
<ul>
|
|
|
<li>Single precision: bias = 127, exp: [1..254], E: [-126..127] => [10-38 … 1038]</li>
|
|
|
<li>Called “normalized” because binary numbers are normalized</li>
|
|
|
</ul>
|
|
|
<ul>
|
|
|
<li>Effect: “We get the leading bit for free”
|
|
|
<ul>
|
|
|
<li>Leading bit is always assumed (never part of bit pattern)</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ol>
|
|
|
</li>
|
|
|
<li>IEEE floating point number as encoding scheme
|
|
|
<ul>
|
|
|
<li>Fractional decimal number IEEE 754 (bit pattern)</li>
|
|
|
<li>
|
|
|
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>V</mi><mo>=</mo><mo stretchy="false">(</mo><mtext>–</mtext><mn>1</mn><msup><mo stretchy="false">)</mo><mi>s</mi></msup><mi>M</mi><msup><mn>2</mn><mi>E</mi></msup></mrow><annotation encoding="application/x-tex">V = (–1)^{s} M 2^{E}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathnormal" style="margin-right:0.22222em;">V</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">–1</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7143919999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10903em;">M</span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.05764em;">E</span></span></span></span></span></span></span></span></span></span></span></span></span>
|
|
|
<ul>
|
|
|
<li>s is sign bit, M = 1 + frac, E = exp – bias, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>bias</mtext><mo>=</mo><msup><mn>2</mn><mrow><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msup><mtext>–</mtext><mn>1</mn></mrow><annotation encoding="application/x-tex">\text{bias} = 2^{k-1} – 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord text"><span class="mord">bias</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.8491079999999999em;vertical-align:0em;"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.03148em;">k</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span><span class="mord">–1</span></span></span></span> and k is width of exp</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
<h2 id="next-lecture">Next Lecture</h2>
|
|
|
<ul>
|
|
|
<li>Representing data in memory – Most of this is review
|
|
|
<ul>
|
|
|
<li>“Under the Hood” - Von Neumann architecture</li>
|
|
|
<li>Bits and bytes in memory
|
|
|
<ul>
|
|
|
<li>How to diagram memory -> Used in this course and other references</li>
|
|
|
<li>How to represent series of bits -> In binary, in hexadecimal (conversion)</li>
|
|
|
<li>What kind of information (data) do series of bits represent -> Encoding scheme</li>
|
|
|
<li>Order of bytes in memory -> Endian</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Bit manipulation – bitwise operations
|
|
|
<ul>
|
|
|
<li>Boolean algebra + Shifting</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Representing integral numbers in memory
|
|
|
<ul>
|
|
|
<li>Unsigned and signed</li>
|
|
|
<li>Converting, expanding and truncating</li>
|
|
|
<li>Arithmetic operations</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>Representing real numbers in memory
|
|
|
<ul>
|
|
|
<li>IEEE floating point representation</li>
|
|
|
<li>Floating point in C – casting, rounding, addition, …</li>
|
|
|
</ul>
|
|
|
</li>
|
|
|
</ul>
|
|
|
|
|
|
|
|
|
</div>
|
|
|
</main>
|
|
|
<hr>
|
|
|
<footer>
|
|
|
</footer>
|
|
|
</body>
|
|
|
</html>
|