12 KiB
layout | math |
---|---|
simple | true |
CMPT 295
Unit - Data Representation
Lecture 3 – Representing integral numbers in memory - unsigned and signed
Last Lecture
- Von Neumann architecture
- Architecture of most computers
- Its components: CPU, memory, input and ouput, bus
- One of its characteristics: Data and code (programs) both stored in memory
- A look at memory: defined byte-addressable memory, diagram of (compressed) memory
- Word size (w): size of a series of bits (or bit vector) we manipulate, also size of machine words (see Section 2.1.2)
- A look at bits in memory
- Why binary numeral system (0 and 1 -> two values) is used to represent information in memory
- Algorithm for converting binary to hexadecimal (hex)
- Partition bit vector into groups of 4 bits, starting from right, i.e., least significant byte (LSB)
- If most significant “byte” (MSB) does not have 8 bits, pad it: add 0’s to its left
- Translate each group of 4 bits into its hex value
- What do bits represent? Encoding scheme gives meaning to bits
- Order of bytes in memory: little endian versus big endian
- Bit manipulation – regardless of what bit vectors represent
- Boolean algebra: bitwise operations => AND (&), OR (|), XOR (^), NOT (~)
- Shift operations: left shift, right logical shift and right arithmetic shift
- Logical shift: Fill x with y 0’s on left
- Arithmetic shift: Fill x with y copies of x‘s sign bit on left
- Sign bit: Most significant bit (MSb) before shifting occurred
NOTE: C logical operators and C bitwise (bit-level) operators behave differently! Watch out for && versus &, || versus |, …
Today’s Menu
- Representing data in memory – Most of this is review
- “Under the Hood” - Von Neumann architecture
- Bits and bytes in memory
- How to diagram memory -> Used in this course and other references
- How to represent series of bits -> In binary, in hexadecimal (conversion)
- What kind of information (data) do series of bits represent -> Encoding scheme
- Order of bytes in memory -> Endian
- Bit manipulation – bitwise operations
- Boolean algebra + Shifting
- Representing integral numbers in memory
- Unsigned and signed
- Converting, expanding and truncating
- Arithmetic operations
- Representing real numbers in memory
- IEEE floating point representation
- Floating point in C – casting, rounding, addition, …
Warm up exercise!
As a warm up exercise, fill in the blanks!
- If the context is C (on our target machine)
- char => _____ bits/ _____ byte
- short => _____ bits/ _____ bytes
- int => _____ bits/ _____ bytes
- long => _____ bits/ _____ bytes
- float => _____ bits/ _____ bytes
- double => _____ bits/ _____ bytes
- pointer (e.g. char *) => _____ bits/ _____ bytes
Unsigned integral numbers
Remember:
Address|M[] size-1| ...| 0x0003|01000010 0x0002|01101001 0x0001|01110100 0x0000|01110011
-
What if the byte at M[0x0002] represented an unsigned integral number, what would be its value? x = a series of bits = bit vector w = width of bit vector
-
X = 01101001_{2}, w=8
-
Let’s apply the encoding scheme:
{% katex display %} \it \text{B2U}(X) = \sum_{i=0}^{w-1} X_{i} \times 2^i {% endkatex %}
Example: $0 \times 2^7 + 1 \times 2^6 + 1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^0 =
$
- For w = 8, range of possible unsigned values: [ blank ]
- For any w, range of possible unsigned values: [ blank ]
- Conclusion: w bits can only represent a fixed # of possible values, but these w bits represent these values exactly
B2U(X) Conversion (Encoding scheme)
- Positional notation: expand and sum all terms
Decimal:
d_{i}
d_{i-1}
...|...
d_{2}
d_{1}
d_{0}
Binary:
b_{i}
b_{i-1}
...|...
b_{2}
b_{1}
b_{0}
Remember: {% katex display %} \it \text{B2U}(X) = \sum_{i=0}^{w-1} X_{i} \times 2^i {% endkatex %}
Example: $246_{10} = 2 \times 10^{2} + 4 \times 10^{1} + 6 \times 10^{0}
$
Range of possible values?
- If the context is C (on our target machine)
- unsigned char?
- unsigned short?
- unsigned int?
- unsigned long?
Examples of “Show your work”
U2B(X) Conversion (into 8-bit binary # => w = 8)
Method 1 - Using subtraction: subtracting decreasing power of 2 until reach 0:
Starting number: 246
-
246 – 128 = 118
-
118 – 64 = 54
-
54 – 32 = 22
-
22 – 16 = 6
-
6 – 8 = \text{nop!}
-
6 – 4 = 2
-
2 – 2 = 0
-
0 – 1 = \text{nop!}
-
246_{10} = 11110110_{2}
Method 2 - Using division: dividing by 2 until reach 0
Start with 246
-
246 \div 2 = 123,R=0
-
123 \div 2 = 61,R=1
-
61 \div 2 = 30,R=1
-
30 \div 2 = 15,R=0
-
15 \div 2 = 7,R=1
-
7 \div 2 = 3,R=1
-
3 \div 2 = 1,R=1
-
1 \div 2 = 0,R=1
-
246_{10} = 11110110_{2}
U2B(X) Conversion – A few tricks
-
Decimal -> binary
- Trick: When decimal number is 2n, then its binary representation is 1 followed by n zero’s
- Let’s try:
if X = 32 => X = 25, then n = 5 => 100002 (w = 5)
What if w = 8? Check: 1 x 24 = 32
-
Decimal -> hex
- Trick: When decimal number is 2n, then its hexadecimal representation is 2i followed by j zero’s, where n = i + 4j and 0 <= i <=3
- Let try: if X = 8192 => X = 213, then n = 13 and 13 = i + 4j => 1 + 4 x 3 => 0x2000. Convert 0x2000 into a binary number: Check: 2 x 163 = 2 x 4096 = 8192
Signed integral numbers
Remember:
Address|M[] size-1| ...| 0x0003|01000010 0x0002|01101001 0x0001|01110100 0x0000|01110011
- What if the byte at M[0x0001] represented a signed integral number, what would be its value?
- X = 111101002 w = 8
- T => Two’s Complement, w => width of the bit vector (annotation: first part of equaltion [everything before the plus sign] is the "sign bit")
- Let’s apply the encoding scheme:
{% katex display %} \it {B2T}(X) = -x_{w-1} \times 2^{w-1} + \sum_{i=0}^{w-2} x_{i} \times 2^{i} {% endkatex %}
Example: -1 \times 2^{7} + 1 \times 2^{6} + 1 \times 2^{5} + 1 \times 2^{4} + 0 \times 2^{3} + 1 \times 2^{2} + 0 \times 2^{1} + 0 \times 2^{0} = ?
-
What would be the bit pattern of the …
- Most negative value:
- Most positive value:
-
For w = 8, range of possible signed values: [ blank ]
-
For any w, range of possible signed values: [ blank ]
-
Conclusion: same as for unsigned integral numbers
Examples of “Show your work”
T2B(X) Conversion -> Two’s Complement
Annotation: w = 8
Method 1: If X < 0, (~(U2B(|X|)))+1
If X = -14 (and 8 bit binary #s)
-
\lvert X\rvert => \lvert -14 \vert =
-
\text{U2B}(14)
- (first symbol is a tilde)
\sim(00001110_{2})
=>
-
(11110001_{2})+1
Binary addition:
11110001 + 00000001 = ????????
Method 2: If X = -14 (and 8 bit binary #s)
-
X + 2^{w} => -14 +
-
\text{U2B}(242)
Using subtraction:
-
242 - 128 = 114
-
114 - 64 = 50
-
50 – 32 = 18
-
18 – 16 = 2
-
2 – 8 = \text{nop!}
-
2 – 4 = \text{nop!}
-
2 – 2 = 0
-
0 – 1 = \text{nop!}
Properties of unsigned & signed conversions
Annotation: w = 4
X|B2U(X)|B2T(X) 0000|0|0 0001|1|1 0010|2|2 0011|3|3 0100|4|4 0101|5|5 0110|6|6 0111|7|7 1000|8|-8 1001|9|-7 1010|10|-6 1011|11|-5 1100|12|-4 1101|13|-3 1110|14|-2 1111|15|-1
- Equivalence
- Both encoding schemes (B2U and B2T ) produce the same bit patterns for nonnegative values
- Uniqueness
- Every bit pattern produced by these encoding schemes (B2U and B2T) represents a unique (and exact) integer value
- Each representable integer has unique bit pattern
Converting between signed & unsigned of same size (same data type)
-
Unsigned
- w=8
- if unsigned
ux
= 12910 - U2T(X) = B2T(U2B(X))
- then x = ???
- Maintain Same Bit Pattern
-
Signed (Two’s Complement)
- w=4
- if signed (2's C)
$x = -5_{10}
$ - T2U(X) = B2U(T2B(X))
- then unsigned
ux
= ??? - Maintain Same Bit Pattern
-
Conclusion - Converting between unsigned and signed numbers: Both have same bit pattern, however, this bit pattern may be interpreted differently, i.e., producing a different value
Converting signed to unsigned (and back) with $w = 4
$
Signed|Bits|Unsigned|Note
0|0000|0|All rows from 0-7 inclusive can be converted from signed to unsigned with T2U(X), and unsigned to signed with U2T(X).
1|0001|1|
2|0010|2|
3|0011|3|
4|0100|4|
5|0101|5|
6|0110|6|
7|0111|7|
-8|1000|8|All rows from here to 15 inclusive can be converted to the other like so: T2U(signed + 16/$2^{4}
) -> unsigned, U2T(unsigned - 16/
2^{4}
$) -> signed.
-7|1001|9|
-6|1010|10|
-5|1011|11|
-4|1100|12|
-3|1101|13|
-2|1110|14|
-1|1111|15|
Visualizing the relationship between signed & unsigned
If $w = 4, 2^{4} = 16
$
- Signed (2's Complement) Range: TMin to TMax (0 is the center)
- Unsigned range: 0 to UMax (TMax is the center)
Sign extension
- Converting unsigned (or signed) of different sizes (different data types)
- Small data type -> larger
- Sign extension
- Unsigned: zero extension
- Signed: sign bit extension
- Conclusion: Value unchanged
- Let’s try:
- Going from a data type that has a width of 3 bits (w = 3) to a data type that has a width of 5 bits (w = 5)
- Unsigned:
$X = 3 = 011_{2},w=3
,
X = 4 = 100_{2},w = 3
$- New:
$X = ? = ?_{2},w=5
,
X = ? + ?_{2},w=5
$
- New:
- Signed:
$X = 3 = 011_{2},w=3
,
X=-3 = 101_{2},w=3
$- New:
$X = ? = ?_{2},w=5
,
x = ? = ?_{2}, w=5
$
- New:
Truncation
- Converting unsigned (or signed) of different sizes(different data types)
2. Large data type -> smaller
- Truncation
- Conclusion: Value may be altered
- A form of overflow
- Let’s try:
- Going from a data type that has a width of 5 bits (w = 5) to a data type that has a width of 3 bits (w = 3)
- Unsigned:
$X = 27 = 11011_{2},w = 5
$- New:
$X = ? = ?_{2},w=3
$
- New:
- Signed:
$X = -15 = 10001_{2},w=3
,
X = -1 = 11111_{2}, w=5
$- New:
$X = ? = ?_{2}, w=3
,
X = ? = ?_{2}, w=3
$
- New:
Summary
- Interpretation of bit pattern B into either unsigned value U or signed value T
- B2U(X) and U2B(X) encoding schemes (conversion)
- B2T(X) and T2B(X) encoding schemes (conversion)
- Signed value expressed as two’s complement => T
- Conversions from unsigned <-> signed values
- U2T(X) and T2U(X) => adding or subtracting
$2^{w}
$
- U2T(X) and T2U(X) => adding or subtracting
- Implication in C: when converting (implicitly via promotion and explicitly via casting):
- Sign:
- Unsigned <-> signed (of same size) -> Both have same bit pattern, however, this bit pattern may be interpreted differently
- Can have unexpected effects -> producing a different value
- Unsigned <-> signed (of same size) -> Both have same bit pattern, however, this bit pattern may be interpreted differently
- Size:
- Small -> large (for signed, e.g., short to int and for unsigned, e.g., unsigned short to unsigned int)
- sign extension: For unsigned -> zeros extension, for signed -> sign bit extension
- Both yield expected result –> resulting value unchanged
- Large -> small (e.g., unsigned int to unsigned short)
- truncation: Unsigned/signed -> most significant bits are truncated (discarded)
- May not yield expected results -> original value may be altered
- Small -> large (for signed, e.g., short to int and for unsigned, e.g., unsigned short to unsigned int)
- Both (sign and size): 1) size conversion is first done then 2) sign conversion is done
- Sign:
Next Lecture
- Representing data in memory – Most of this is review
- “Under the Hood” - Von Neumann architecture
- Bits and bytes in memory
- How to diagram memory -> Used in this course and other references
- How to represent series of bits -> In binary, in hexadecimal (conversion)
- What kind of information (data) do series of bits represent -> Encoding scheme
- Order of bytes in memory -> Endian
- Bit manipulation – bitwise operations
- Boolean algebra + Shifting
- Representing integral numbers in memory
- Unsigned and signed
- Converting, expanding and truncating
- Arithmetic operations
- Representing real numbers in memory
19
- IEEE floating point representation
- Floating point in C – casting, rounding, addition, …