You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

12 KiB

layout math
simple true

CMPT 295

Unit - Data Representation

Lecture 3 Representing integral numbers in memory - unsigned and signed

Last Lecture

  • Von Neumann architecture
    • Architecture of most computers
    • Its components: CPU, memory, input and ouput, bus
    • One of its characteristics: Data and code (programs) both stored in memory
  • A look at memory: defined byte-addressable memory, diagram of (compressed) memory
    • Word size (w): size of a series of bits (or bit vector) we manipulate, also size of machine words (see Section 2.1.2)
  • A look at bits in memory
    • Why binary numeral system (0 and 1 -> two values) is used to represent information in memory
    • Algorithm for converting binary to hexadecimal (hex)
      1. Partition bit vector into groups of 4 bits, starting from right, i.e., least significant byte (LSB)
      • If most significant “byte” (MSB) does not have 8 bits, pad it: add 0s to its left
      1. Translate each group of 4 bits into its hex value
    • What do bits represent? Encoding scheme gives meaning to bits
    • Order of bytes in memory: little endian versus big endian
  • Bit manipulation regardless of what bit vectors represent
    • Boolean algebra: bitwise operations => AND (&), OR (|), XOR (^), NOT (~)
    • Shift operations: left shift, right logical shift and right arithmetic shift
      • Logical shift: Fill x with y 0s on left
      • Arithmetic shift: Fill x with y copies of xs sign bit on left
      • Sign bit: Most significant bit (MSb) before shifting occurred

NOTE: C logical operators and C bitwise (bit-level) operators behave differently! Watch out for && versus &, || versus |, …

Todays Menu

  • Representing data in memory Most of this is review
    • “Under the Hood” - Von Neumann architecture
    • Bits and bytes in memory
      • How to diagram memory -> Used in this course and other references
      • How to represent series of bits -> In binary, in hexadecimal (conversion)
      • What kind of information (data) do series of bits represent -> Encoding scheme
      • Order of bytes in memory -> Endian
    • Bit manipulation bitwise operations
      • Boolean algebra + Shifting
  • Representing integral numbers in memory
    • Unsigned and signed
    • Converting, expanding and truncating
    • Arithmetic operations
  • Representing real numbers in memory
    • IEEE floating point representation
    • Floating point in C casting, rounding, addition, …

Warm up exercise!

As a warm up exercise, fill in the blanks!

  • If the context is C (on our target machine)
    • char => _____ bits/ _____ byte
    • short => _____ bits/ _____ bytes
    • int => _____ bits/ _____ bytes
    • long => _____ bits/ _____ bytes
    • float => _____ bits/ _____ bytes
    • double => _____ bits/ _____ bytes
    • pointer (e.g. char *) => _____ bits/ _____ bytes

Unsigned integral numbers

Remember:

Address|M[] size-1| ...| 0x0003|01000010 0x0002|01101001 0x0001|01110100 0x0000|01110011

  • What if the byte at M[0x0002] represented an unsigned integral number, what would be its value? x = a series of bits = bit vector w = width of bit vector

  • X = 01101001_{2}, w=8
  • Lets apply the encoding scheme:

{% katex display %} \it \text{B2U}(X) = \sum_{i=0}^{w-1} X_{i} \times 2^i {% endkatex %}

Example: $0 \times 2^7 + 1 \times 2^6 + 1 \times 2^5 + 0 \times 2^4 + 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^0 = $

  • For w = 8, range of possible unsigned values: [ blank ]
  • For any w, range of possible unsigned values: [ blank ]
  • Conclusion: w bits can only represent a fixed # of possible values, but these w bits represent these values exactly

B2U(X) Conversion (Encoding scheme)

  • Positional notation: expand and sum all terms

Decimal:

d_{i}
d_{i-1}

...|...

d_{2}
d_{1}
d_{0}

Binary:

b_{i}
b_{i-1}

...|...

b_{2}
b_{1}
b_{0}

Remember: {% katex display %} \it \text{B2U}(X) = \sum_{i=0}^{w-1} X_{i} \times 2^i {% endkatex %}

Example: $246_{10} = 2 \times 10^{2} + 4 \times 10^{1} + 6 \times 10^{0}$

Range of possible values?

  • If the context is C (on our target machine)
    • unsigned char?
    • unsigned short?
    • unsigned int?
    • unsigned long?

Examples of “Show your work”

U2B(X) Conversion (into 8-bit binary # => w = 8)

Method 1 - Using subtraction: subtracting decreasing power of 2 until reach 0:

Starting number: 246

  • 246  128 = 118
  • 118  64 = 54
  • 54  32 = 22
  • 22  16 = 6
  • 6  8 = \text{nop!}
  • 6  4 = 2
  • 2  2 = 0
  • 0  1 = \text{nop!}
  • 246_{10} = 11110110_{2}

Method 2 - Using division: dividing by 2 until reach 0

Start with 246

  • 246 \div 2 = 123,R=0
  • 123 \div 2 = 61,R=1
  • 61 \div 2 = 30,R=1
  • 30 \div 2 = 15,R=0
  • 15 \div 2 = 7,R=1
  • 7 \div 2 = 3,R=1
  • 3 \div 2 = 1,R=1
  • 1 \div 2 = 0,R=1
  • 246_{10} = 11110110_{2}

U2B(X) Conversion A few tricks

  • Decimal -> binary

    • Trick: When decimal number is 2n, then its binary representation is 1 followed by n zeros
    • Lets try: if X = 32 => X = 25, then n = 5 => 100002 (w = 5) What if w = 8? Check: 1 x 24 = 32
  • Decimal -> hex

    • Trick: When decimal number is 2n, then its hexadecimal representation is 2i followed by j zeros, where n = i + 4j and 0 <= i <=3
    • Let try: if X = 8192 => X = 213, then n = 13 and 13 = i + 4j => 1 + 4 x 3 => 0x2000. Convert 0x2000 into a binary number: Check: 2 x 163 = 2 x 4096 = 8192

Signed integral numbers

Remember:

Address|M[] size-1| ...| 0x0003|01000010 0x0002|01101001 0x0001|01110100 0x0000|01110011

  • What if the byte at M[0x0001] represented a signed integral number, what would be its value?
  • X = 111101002 w = 8
  • T => Twos Complement, w => width of the bit vector (annotation: first part of equaltion [everything before the plus sign] is the "sign bit")
  • Lets apply the encoding scheme:

{% katex display %} \it {B2T}(X) = -x_{w-1} \times 2^{w-1} + \sum_{i=0}^{w-2} x_{i} \times 2^{i} {% endkatex %}

Example: -1 \times 2^{7} + 1 \times 2^{6} + 1 \times 2^{5} + 1 \times 2^{4} + 0 \times 2^{3} + 1 \times 2^{2} + 0 \times 2^{1} + 0 \times 2^{0} = ?

  • What would be the bit pattern of the …

    • Most negative value:
    • Most positive value:
  • For w = 8, range of possible signed values: [ blank ]

  • For any w, range of possible signed values: [ blank ]

  • Conclusion: same as for unsigned integral numbers

Examples of “Show your work”

T2B(X) Conversion -> Twos Complement

Annotation: w = 8

Method 1: If X < 0, (~(U2B(|X|)))+1

If X = -14 (and 8 bit binary #s)

  1. \lvert X\rvert => \lvert -14 \vert = 
  2. \text{U2B}(14)
  3. (first symbol is a tilde) \sim(00001110_{2}) =>
  4. (11110001_{2})+1

Binary addition:

  11110001
+ 00000001
= ????????

Method 2: If X = -14 (and 8 bit binary #s)

  1. X + 2^{w} => -14 +
  2. \text{U2B}(242)

Using subtraction:

  1. 242 - 128 = 114
  2. 114 - 64 = 50
  3. 50  32 = 18
  4. 18  16 = 2
  5. 2  8 = \text{nop!}
  6. 2  4 = \text{nop!}
  7. 2  2 = 0
  8. 0  1 = \text{nop!}

Properties of unsigned & signed conversions

Annotation: w = 4

X|B2U(X)|B2T(X) 0000|0|0 0001|1|1 0010|2|2 0011|3|3 0100|4|4 0101|5|5 0110|6|6 0111|7|7 1000|8|-8 1001|9|-7 1010|10|-6 1011|11|-5 1100|12|-4 1101|13|-3 1110|14|-2 1111|15|-1

  • Equivalence
    • Both encoding schemes (B2U and B2T ) produce the same bit patterns for nonnegative values
  • Uniqueness
    • Every bit pattern produced by these encoding schemes (B2U and B2T) represents a unique (and exact) integer value
    • Each representable integer has unique bit pattern

Converting between signed & unsigned of same size (same data type)

  • Unsigned

    • w=8
    • if unsigned ux = 12910
    • U2T(X) = B2T(U2B(X))
    • then x = ???
    • Maintain Same Bit Pattern
  • Signed (Twos Complement)

    • w=4
    • if signed (2's C) $x = -5_{10}$
    • T2U(X) = B2U(T2B(X))
    • then unsigned ux = ???
    • Maintain Same Bit Pattern
  • Conclusion - Converting between unsigned and signed numbers: Both have same bit pattern, however, this bit pattern may be interpreted differently, i.e., producing a different value

Converting signed to unsigned (and back) with $w = 4$

Signed|Bits|Unsigned|Note 0|0000|0|All rows from 0-7 inclusive can be converted from signed to unsigned with T2U(X), and unsigned to signed with U2T(X). 1|0001|1| 2|0010|2| 3|0011|3| 4|0100|4| 5|0101|5| 6|0110|6| 7|0111|7| -8|1000|8|All rows from here to 15 inclusive can be converted to the other like so: T2U(signed + 16/$2^{4}) -> unsigned, U2T(unsigned - 16/2^{4}$) -> signed. -7|1001|9| -6|1010|10| -5|1011|11| -4|1100|12| -3|1101|13| -2|1110|14| -1|1111|15|

Visualizing the relationship between signed & unsigned

If $w = 4, 2^{4} = 16$

  • Signed (2's Complement) Range: TMin to TMax (0 is the center)
  • Unsigned range: 0 to UMax (TMax is the center)

Sign extension

  • Converting unsigned (or signed) of different sizes (different data types)
    1. Small data type -> larger
    • Sign extension
      • Unsigned: zero extension
      • Signed: sign bit extension
  • Conclusion: Value unchanged
  • Lets try:
    • Going from a data type that has a width of 3 bits (w = 3) to a data type that has a width of 5 bits (w = 5)
  • Unsigned: $X = 3 = 011_{2},w=3, X = 4 = 100_{2},w = 3$
    • New: $X = ? = ?_{2},w=5, X = ? + ?_{2},w=5$
  • Signed: $X = 3 = 011_{2},w=3, X=-3 = 101_{2},w=3$
    • New: $X = ? = ?_{2},w=5, x = ? = ?_{2}, w=5$

Truncation

  • Converting unsigned (or signed) of different sizes(different data types) 2. Large data type -> smaller
    • Truncation
  • Conclusion: Value may be altered
    • A form of overflow
  • Lets try:
    • Going from a data type that has a width of 5 bits (w = 5) to a data type that has a width of 3 bits (w = 3)
  • Unsigned: $X = 27 = 11011_{2},w = 5$
    • New: $X = ? = ?_{2},w=3$
  • Signed: $X = -15 = 10001_{2},w=3, X = -1 = 11111_{2}, w=5$
    • New: $X = ? = ?_{2}, w=3, X = ? = ?_{2}, w=3$

Summary

  • Interpretation of bit pattern B into either unsigned value U or signed value T
    • B2U(X) and U2B(X) encoding schemes (conversion)
    • B2T(X) and T2B(X) encoding schemes (conversion)
      • Signed value expressed as twos complement => T
  • Conversions from unsigned <-> signed values
    • U2T(X) and T2U(X) => adding or subtracting $2^{w}$
  • Implication in C: when converting (implicitly via promotion and explicitly via casting):
    • Sign:
      • Unsigned <-> signed (of same size) -> Both have same bit pattern, however, this bit pattern may be interpreted differently
        • Can have unexpected effects -> producing a different value
    • Size:
      • Small -> large (for signed, e.g., short to int and for unsigned, e.g., unsigned short to unsigned int)
        • sign extension: For unsigned -> zeros extension, for signed -> sign bit extension
        • Both yield expected result > resulting value unchanged
      • Large -> small (e.g., unsigned int to unsigned short)
        • truncation: Unsigned/signed -> most significant bits are truncated (discarded)
        • May not yield expected results -> original value may be altered
    • Both (sign and size): 1) size conversion is first done then 2) sign conversion is done

Next Lecture

  • Representing data in memory Most of this is review
    • “Under the Hood” - Von Neumann architecture
    • Bits and bytes in memory
      • How to diagram memory -> Used in this course and other references
      • How to represent series of bits -> In binary, in hexadecimal (conversion)
      • What kind of information (data) do series of bits represent -> Encoding scheme
      • Order of bytes in memory -> Endian
    • Bit manipulation bitwise operations
      • Boolean algebra + Shifting
  • Representing integral numbers in memory
    • Unsigned and signed
    • Converting, expanding and truncating
    • Arithmetic operations
  • Representing real numbers in memory 19
    • IEEE floating point representation
    • Floating point in C casting, rounding, addition, …