You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

10 KiB

layout math
simple true

CMPT 295

  • Unit - Data Representation
    • Lecture 6 Representing fractional numbers in memory
    • IEEE floating point representation contd

Have you heard of that new band "1023 Megabytes"?

They're pretty good, but they don't have a gig just yet. 😭

Last Lecture

  • Representing integral numbers in memory

    • Can encode a small range of values exactly (in 1, 2, 4, 8 bytes)
      • For example: We can represent the values -128 to 127 exactly in 1 byte using a signed char in C
  • Representing fractional numbers in memory

    1. Positional notation has some advantages, but also disadvantages -> so not used!
    2. IEEE floating point representation: can encode a much larger range of e.g., single precision: [10-38..1038] values approximately (in 4 or 8 bytes)
  • Overview of IEEE floating point representation

    • Precision options (float 32-bit, double 64-bit)
    • V = (-1)s x M x 2E
    • s > sign bit
    • exp encodes E (but != E)
    • frac encodes M (but != M)

We interpret the bit vector (expressed in IEEE floating point encoding) stored in memory using this equation.

Todays Menu

  • Representing data in memory Most of this is review
    • “Under the Hood” - Von Neumann architecture
    • Bits and bytes in memory
      • How to diagram memory -> Used in this course and other references
      • How to represent series of bits -> In binary, in hexadecimal (conversion)
      • What kind of information (data) do series of bits represent -> Encoding scheme
      • Order of bytes in memory -> Endian
    • Bit manipulation bitwise operations
      • Boolean algebra + Shifting
  • Representing integral numbers in memory
    • Unsigned and signed
    • Converting, expanding and truncating
    • Arithmetic operations
  • Representing real numbers in memory 4
    • IEEE floating point representation
    • Floating point in C casting, rounding, addition, …

IEEE Floating Point Representation Three “kinds” of values

We interpret the bit vector (expressed in IEEE floating point encoding) stored in memory using this equation:

{% katex display %} V = (-1)^{s} M 2^{E} {% endkatex %}

Bit breakdown--exp and frac interpreted as unsigned:

  • s = 1 bit
  • exp = k bits
    1. If exp != 0 and exp != 11...11 (exp range: [0000001...11111110]). Equations:
    • E = \text{exp} - \text{bias}
    • M = 1 + \text{frac}
    1. If exp = 00...00 (all 0's) => denormalized. Equations:
    • E = 1 - \text{bias}
    • M = \text{frac}
    1. If exp 11...11 (all 1's) => special cases.
    • Case 1: frac = 000...0
    • Case 2: frac != 000...0

IEEE floating point representation - normalized

Numerical Form: $V = (1)^{s} M 2^{E}$

Bit breakdown:

  • s = 1 bit
  • exp = k bits
    • If exp != 0 and exp != 11...11 (exp range: [00000001...11111110]) => normalized. Equations:
      • E = \text{exp} - \text{bias}
      • M = 1 + \text{frac}

Why is E biased?

Using single precision as an example (s = 1 bit, exp = 8 bits, frac = 23 bits):

  • (exp range: [00000001 .. 11111110]) => [1_{10}...254_{10}]
  • If E is not biased (i.e. E = exp), then E range [1_{10} ... 254_{10}]
  • V range [$2^{1}...2^{254}] = \[2...\approx 2.89 \times 10^{76}$] (so cannot express numbers < 2)
  • By biasing E (i.e. E = exp - bias), then E range: [1-127...254-127] == [-126...127] (since k = 8, bias = $2^{8-1} - 1$ = 127)
  • V range: [$2^{-126}...2^{127}] = [\approx 1.18 \times 10^{-38}...\approx 1.7 \times 10^{38}$ (so can now express very small (and very large) numbers)
  • Why adding 1 to frac? Because the number (or value) V is first normalized before it is converted.

Review: Scientific Notation and normalization

  • From Wikipedia:

    • Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form (as they are long strings of digits).
    • In scientific notation, nonzero numbers are written in the form +/- M × 10n
    • In normalized notation, the exponent n is chosen such that the absolute value of the significand M is at least 1 (M = 1.0) but less than the base
      • M range for base 10 => [1.0 .. 10.0 ε ]
      • M range for base 2 => [1.0 .. 2.0 ε ]
  • Examples:

    • A proton's mass is 0.0000000000000000000000000016726 kg -> $1.6726 \times 10^{27}$ kg
    • Speed of light is 299,792,458 m/s -> $2.99792,458 \times 10^{8}$ m/s

Syntax of normalized notation:

Name Notation
Sign +/-
Significant $d_{0}, d_{-1}, d_{-2}, d_{-3}...d_{-n}$
Base b
Exponent ^{\text{exp}}
  • Lets try: $101011010.101_{2}$ -> ___

Lets try normalizing these fractional binary numbers!

  1. 101011010.101_{2}
  2. 0.000000001101_{2}
  3. 11000000111001_{2}

IEEE floating point representation

  • Once V is normalized, we apply the equations
    • V = (1)^{s} M 2^{E} = 1.01011010101_{2} \times 2^{8}
    • s = ???
    • E = \text{exp} - \text{bias}
    • exp = E + bias = ___
    • M = 1 + frac = ___
    • s = 1 bit, exp = k bits => 8 bits, frac n bits => 23 bits
    • bit vector in memory:

Why adding 1 to frac (or subtracting 1 from M)?

  • Because the number (or value) V is first normalized before it is converted.
    • As part of this normalization process, we transform our binary number such that its significand M is within the range [1.0 .. 2.0 ε ]
    • Remember: M range for base 2 => [1.0 ... 2.0 ε]
    • This implies that M is always at least 1.0, so its integral part always has the value 1
    • So since this bit is always part of M, IEEE 754 does not explicitly save it in its bit pattern (i.e., in memory)
    • Instead, this bit is implied!

Why adding 1 to frac (or subtracting 1 from M)?

Implying this bit has the following effects:

We get the leading bit for free!

  1. We save 1 bit when we convert (represent) a fractional decimal number into a bit pattern using IEEE 754 floating point representation
  2. We have to add this 1 bit back when we convert from a bit pattern (IEEE 754 floating point representation) back to a fractional decimal

Example: $V = (1)^{s} M 2^{E} = 1.01011010101 \times 2^{8}$

M = 1. 01011010101 => M = 1 + frac

This bit is implied hence not stored in the bit pattern produced by the IEEE 754 floating point representation, and what we store in the frac part of the IEEE 754 bit pattern is 01011010101

IEEE floating point representation (single precision)

  • What if the 4 bytes starting at M[0x0000] represented a fractional decimal number (encoded as an IEEE floating point number) -> value? single precision

Numerical Form: $V = (1)^{s} M 2^{E}$

Value Notes
1
10000111 k=8 bits, interpreted as unsigned
01011010101000000000000 n=23 bits, interpreted as unsigned
  • exp ≠ 0 and exp ≠ 111111112 -> normalized
  • s = ___
  • E = exp bias where bias = $2^{k-1} 1 = 2^{7} 1 = 128 1 = 127
  • E = ____ - 127 =
  • M = 1 + frac = 1 + ___
  • V = ____

Little endian memory layout:

Address M[]
size-1
... ...
0x0003 11000011
0x0002 10101101
0x0001 01010000
0x0000 00000000

Lets give it a go!

  • What if the 4 bytes starting at M[0x0000] represented a fractional decimal number (encoded as an IEEE floating point number) -> value?

Numerical form: $V = (-1)^{s} M 2^{E}$

single precision

Value Notes
0
10001100 k=8 bits, interpreted as unsigned
11011011011000000000000 n=23 bits, interpreted as unsigned
  • exp ≠ 0 and exp ≠ 111111112 -> normalized
  • s = ____
  • E = exp - bias where \text{bias} = 2^{7} - 1 = 128 - 1 = 127$$
  • E = ____ - 127 = ___
  • M = 1 + frac = 1 + ____
  • V = ____

Little endian memory map:

Address M[]
size-1
... ...
0x0003 01000110
0x0002 01101101
0x0001 10110000
0x0000 00000000

IEEE floating point representation (single precision)

How would 47.21875 be encoded as IEEE floating point number?

  1. Convert 47.28 to binary (using the positional notation R2B(X)) =>
  • 47 = 101111_{2}
  • .21875 = .00111_{2}
  1. Normalize binary number: 101111.00111 => $1.0111100111_{2} \times 2^{5}$
  • V = (1)^{s} M 2^{E}
  1. Determine …
  • s = 0
  • E = exp bias where \text{bias} = 2^{k-1} 1 = 2^{7} 1 = 128 1 = 127$$
  • exp = E + bias = 5 + 127 = 132 => U2B(132) => 10000100
  • M = 1 + frac -> frac = M - 1 => $1.0111100111_{2} - 1 = .0111100111_{2}$
  1. 32 bits organized in s|exp|frac: [0|10000100|01111001110000000000000}
  2. 0x423CE000

IEEE floating point representation (single precision)

How would 12345.75 be encoded as IEEE floating point number?

{% katex display %} V = (-1)^{s} M 2^{E} {% endkatex %}

  1. Convert 12345.75 to binary
  • 12345 => ____ .75 => ____
  1. Normalize binary number:
  2. Determine …
  • s = ____
  • E = exp bias where \text{bias} = 2^{k-1} 1 = 2^{7} 1 = 128 1 = 127$$
  • exp = E + bias = ____
  • M = 1 + frac -> frac = M - 1
  1. \_\|\_\_\_\_\_\_\_\_\|\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
  2. Express in hex:

Summary

  • IEEE Floating Point Representation
    1. Denormalized
    2. Special cases
    3. Normalized => exp ≠ 000…0 and exp ≠ 111…1
    • Single precision: bias = 127, exp: [1..254], E: [-126..127] => [10-38 … 1038]
    • Called “normalized” because binary numbers are normalized
      • Effect: “We get the leading bit for free”
        • Leading bit is always assumed (never part of bit pattern)
  • IEEE floating point number as encoding scheme
    • Fractional decimal number  IEEE 754 (bit pattern)
    • V = (1)^{s} M 2^{E}
      • s is sign bit, M = 1 + frac, E = exp bias, \text{bias} = 2^{k-1} 1$$ and k is width of exp

Next Lecture

  • Representing data in memory Most of this is review
    • “Under the Hood” - Von Neumann architecture
    • Bits and bytes in memory
      • How to diagram memory -> Used in this course and other references
      • How to represent series of bits -> In binary, in hexadecimal (conversion)
      • What kind of information (data) do series of bits represent -> Encoding scheme
      • Order of bytes in memory -> Endian
    • Bit manipulation bitwise operations
      • Boolean algebra + Shifting
  • Representing integral numbers in memory
    • Unsigned and signed
    • Converting, expanding and truncating
    • Arithmetic operations
  • Representing real numbers in memory
    • IEEE floating point representation
    • Floating point in C casting, rounding, addition, …