CMPT 295 - Unit - Data Representation
Lecture 7
- Representing fractional numbers in memory
- IEEE floating point representation, their arithmetic operations and float in C
Last Lecture
- IEEE floating point representation
- Normalized => exp ≠ 000…0 and exp ≠ 111…1
- Single precision: bias = 127, exp: [1..254], E: [-126..127] => [10-38 … 1038]
- Called “normalized” because binary numbers are normalized as part of the conversion process
- Effect: “We get the leading bit for free” => leading bit is always assumed (never part of bit pattern) * Conversion: IEEE floating point number as encoding scheme
-
- ; s is sign bit, , , and k is width of exp
- Denormalized
- Special cases
- Normalized => exp ≠ 000…0 and exp ≠ 111…1
Today’s Menu
- Representing data in memory – Most of this is review
- “Under the Hood” - Von Neumann architecture
- Bits and bytes in memory
- How to diagram memory -> Used in this course and other references
- How to represent series of bits -> In binary, in hexadecimal (conversion)
- What kind of information (data) do series of bits represent -> Encoding scheme
- Order of bytes in memory -> Endian
- Bit manipulation – bitwise operations
- Boolean algebra + Shifting
- “Under the Hood” - Von Neumann architecture
- Representing integral numbers in memory
- Unsigned and signed
- Converting, expanding and truncating
- Arithmetic operations
- Representing real numbers in memory
- IEEE floating point representation
- Floating point in C – casting, rounding, addition, …
IEEE floating point representation (single precision)
How would 47.28 be encoded as IEEE floating point number?
- Convert 47.28 to binary (using the positional notation R2B(X)) =>
- 47 =
- .28 =
- Also expressed as: .28 =
- Normalize binary number:
- becomes:
A diagram of the above number not fitting inside the frac
portion of a signed number due to its width.
Rounding
This selection is done by looking at the bit pattern around the rounding position.
- First, identify bit at rounding position
- Then select which kind of rounding we must perform:
- Round up
- When the value of the bits to the right of the bit at rounding position is > half the worth of the bit at rounding position
- We “round” up by adding 1 to the bit at rounding position
- Round down
- When the value of the bits to the right of the bit at rounding position is < half the worth of the bit at rounding position
- We “round” down by discarding the bits to the right of the bit at rounding position
- Round to “even number”
- When the value of the bits to the right of the bit at rounding position is exactly half the worth of bit at rounding position, i.e., when these bits are 100…02
- We “round” such that the bit at the rounding position becomes 0
- If the bit at rounding position is 1 => then we “round to even number” by “rounding up” i.e., by adding 1
- If the bit at rounding position is already 0 => then we “round to even number” by “rounding down” i.e., by discarding the bits to the right of the bit at rounding position
- Round up
Rounding (and error)
Example: rounding position -> round to nearest 1/4 (2 bits right of binary point)
Value | Binary | Rounded | Action | Rounded Value |
(<1/2—down) | 2 | |||
(>1/2—up) | ||||
(1/2—up to even) | 3 | |||
(1/2—down to even) |
- Explain value of a bit versus worth of a bit
- value of a bit: ____
- worth of a bit: ____
Back to IEEE floating point representation
In the process of converting fractional decimal numbers to IEEE floating point numbers (i.e., bit patterns in fixed-size memory), we apply these same rounding rules …
Using the same numbers in our example:
Imagine that the 4th bit in the binary column is our 23rd bit of the frac
=> rounding position.
And the 5th bit is the 24th bit.
Value | Binary |
Homework – Let’s practice converting and rounding!
How would 346.62 be encoded as IEEE floating point number (single precision) in memory?
Also, can you compute the minimum value of the error introduced by the rounding process since 346.62 can only be approximated when encoded as an IEEE floating point representation
Denormalized values
Equations:
Example:
- Condition: (single precision)
- Denormalized Values:
- Case 1: -> +0 and –0
- Case 2: -> numbers closest to 0.0 (equally spaced)
Smallest:
s | exp | frac |
---|---|---|
0 | 00000000 | 00000000000000000000001 |
Largest:
s | exp | frac |
---|---|---|
0 | 00000000 | 11111111111111111111111 |
Special values
Condition: exp = 111…1
- Case 1: frac = 000…0
- Represents value (infinity)
- Operation that overflows
- Both positive and negative
- E.g.:
- Case 2: frac ≠ 000…0
- Not-a-Number (NaN)
- Represents case when no numeric value can be determined
- e.g.:
- NaN propagates other NaN: e.g.,
Axis of all floating point values
- s = 1 bit
- exp = k bits
- if and => Normalized
- if (all 0’s) => Denormalized
- if (all 1’s) => Special Cases n bits
- frac = n bits
Starting with the lowest to the highest, what values can be produced:
- NaN
- -Normalized
- -Denormalized
- -0
- +0
- +Denormalized
- +Normalized
- NaN
What if floating point represented with 8 bits
Equations:
- Denormalized:
- Normalized:
Annotation: To get a feel for all possible values expressible using IEEE like conversion, we use a small w. Here, instead of w = 32, we use w = 8. This way, we can enumerate all values.
s | exp | frac | E | Value | Type | Notes |
---|---|---|---|---|---|---|
0 | 0000 | 000 | -6 | 0 | Denormalized | |
0 | 0000 | 001 | -6 | Denormalized | closest to zero | |
0 | 0000 | 010 | -6 | Denormalized | ||
… | … | … | … | … | … | … |
0 | 0000 | 111 | -6 | Denormalized | ||
0 | 0000 | 111 | -6 | Denormalized | largest denom. | |
0 | 0001 | 000 | -6 | Normalized | smallest nom. | |
0 | 0001 | 001 | -6 | Normalized | ||
… | … | … | … | … | … | … |
0 | 0110 | 110 | -1 | Normalized | ||
0 | 0110 | 111 | -1 | Normalized | closest to one below | |
0 | 0111 | 000 | 0 | Normalized | ||
0 | 0111 | 001 | 0 | Normalized | closest to one above | |
0 | 0111 | 010 | 0 | Normalized | ||
… | … | … | … | … | … | … |
0 | 1110 | 110 | 7 | Normalized | ||
0 | 1110 | 111 | 7 | Normalized | largest nom | |
0 | 1111 | 000 | n/a | NaN |
Conversion in C
- Casting between
int
,float
, anddouble
changes bit pattern double
/float
toint
- Truncates fractional part
int
tofloat
- Exact conversion, as long as
frac
(obtained when the int is normalized) fits in 23 bits - Will round according to rounding rules
- Exact conversion, as long as
int
todouble
- Exact conversion, as long as frac (obtained when the int is normalized) fits in 52 bits
- Will round according to rounding rules
Demo - C code
- Conversion – Observe the change in bit pattern
- int to float
- float to int
- Addition
- Associativity – For floating point numbers
f1
,f2
andf3
:- Is it always true that (f1 + f2) + f3 = f1 + (f2 + f3)?
- Is it always true that (f1 * f2) * f3 = f1 * (f2 * f3)?
- Rounding – Effect of errors caused by rounding
Floating point arithmetic
- x +f y = Round(x + y)
- x *f y = Round(x * y)
- Basic idea:
- First compute true result
- Make it fit into desired precision
- Possibly overflow if exponent too large
- Possibly round to fit into
frac
Summary
- Most fractional decimal numbers cannot be exactly encoded using IEEE floating point representation -> rounding
- Denormalized values
- Condition: exp = 0000…0
- 0 <= denormalized values < 1, equidistant because all have same
- Special values
- Condition: exp = 1111…1
- Case 1:
- Case 2:
- Condition: exp = 1111…1
- Impact on C
- Conversion/casting, rounding
- Arithmetic operators:
- Behaviour not the same as for real arithmetic => violates associativity
Next Lecture
- Introduction
- C program -> assembly code -> machine level code
- Assembly language basics: data, move operation
- Operation leaq and Arithmetic & logical operations
- Conditional Statement – Condition Code + cmovX
- Loops
- Function call – Stack
- Array
- Buffer Overflow
- Floating-point data & operations