one-zero-six

Intro to 106

Signed

Unsigned

-128 to 128

0 to 255

Left most bit is used to denote positive or negative

All bits are used for the number

-2^n-1 to 2^n-1

(2^n)-1

Always consider signed component - especially when comparing two numbers; consider whether if one is signed, and the other is not. To calculate signed from using 2's complement: 1) Invert the bits 2) Add 1 to the binary

funny expressions

-2 < 1U etc etc
- You evaluate -2 as a signed.
- 1U is to be evaluated as a unsigned.

Funny expression

To do?

Signed with Unsigned

⇒ Treat every number as unsigned, then evaluate

Signed with Signed

Evaluate as it is

Unsigned with unsigned

Evaluate as it is

In the cheat sheet:

char ⭐ not signed by default. rest are.

0xAA

short

0xAAAA

int

0xAAAA AAAA

long

0xAAAA AAAA AAAA AAAA

Sorting questions:

1) Find who's signed and unsigned (8 to F)

2) Sort

Bits and Bytes

2's complement allows us to convert subtraction into addition, which simplifies hardware design.
- Ie. Addition (A + B)
- Ie. Subtraction (A + ~B)
  - A adds negation B, which can be simply done by flipping the bits of B and adding 1 to create ⇒ ~B

1 hex byte = 4 bits char -> short -> int -> long

Shifting
- Denoted by >> and <<
- Always shift in bits. << 8 means shift 8 bits, which is 2 hex bytes.
- If there is a shift of << then >>, it is extracting the rightmost byte.
- Logical vs Aritmatic Shift

Logical

Arithmatic

For unsigned integers

For signed integers

For literal shifting, no extension or anything

For dividing (right shift >>), multiplying (left shift <<) by 2 Sign extends the MSBit

Overflow
- Does it leave the range of the given datatype?
Bit Masks
- Checking for MSB/LSB
  - B can stand for bit or byte here

Assembly

Most instructions will update the conditional flags, and some will NOT store the results in the destination register/memory address
- leaq does not update flags, and also does not store results
- cmpq is similar to subq; not addq
  - Does update flags, but not store results
- testq is similar to a bitwise AND
  - Also update flags, and does not store results
Little Endian
- Smallest byte ('Least Significant Byte') ends up in the lowest address
- Ie. [Byte 2] [Byte 3] [Byte 4]
- >> [Byte 4] [Byte 3] [Byte 2]
Big Endian
- First byte ('Most significant byte') ends up in the highest address
- Ie. [Byte 2] [Byte 3] [Byte 4]
- >> [Byte 2] [Byte 3] [Byte 4]

Always assume little endian on x86-64 ISA

When doing questions, converting > for codes, will mean ≤

Caching

Only a program can have temporal locality and spacial locality.

Good temporal locality + Spatial locality:

int sum = 0;
int array[5] = {1, 2, 3, 4, 5};
// On spatial locality:
// Data are accessed one after another

for (int i = 0; i < 5; i++) {
    sum += array[i];
}

// maybe later
sum += array[2]; 

// array2 probably still in cache
// so this second time accessing array[2] and also the sum, meaning that
// access to this data is very fast.

Bad temporal locality:

int sum = 0;
int array[10000];

for (int i = 0; i < 10000; i++) {
    sum += array[i];
}

// Much later, after a lot of other work...
sum += array[2];

// A lot of time passes before accessing array[2]
// Array[2] might not be in the cache

Program reuses data too late ⇒ Bad temporal locality
- When checking temporal locality -> Are there repeated accesses to the same data?

Processor-Memory Bottleneck

Core 2 Duo (CPU + Reg)
- Process 256 bytes/cycle
- Bus bandwidth is only 2 bytes/cycle from main memory

⇒ So, lots of waiting on memory and the bus to transport memory to the Core 2 Duo for processing (also called slow memory access)

Cache

Small and fast storage device (compared to Main Memory)
Acts as a stage area for a subset of frequently or recently used instructions or data
- Instructions are in i-cache
- Data are in d-cache

Main Memory-Cache Interaction

Data is copied in blocks from Main Memory to Cache

Cache Hit

When CPU calls for block n, and block n is in cache.

Cache Miss

When CPU calls for block n, and block n is not in cache.
So, a request is then made to main memory to fetch block n
Here, based on the Mapping Policy of Cache, it decides where block n will be placed in the cache
- Replacement Policy of Cache adds onto the mapping policy. If there is not free space left, this policy decides which block to evict in cache so that block n can be placed in cache for CPU access.

Cache Storage

Is faster than main memory, as when a program runs, it accesses memory in a way that follows patterns that predict what data the program will need next and store it in advance
- Temporal pattern/locality (Time-based)
  - If program uses data now, program will likely use it again soon
  - likely be referenced
- Spatial pattern/locality (Nearby)
  - If program accesses one piece of memory, it will likely access nearby memory
  - tend to be referenced
    Stride-1 access is good spatial locality

Cache Performance Metrics

\text {Cache Hit Access time = HT}

\text{Cache Miss Access Time = HT + MP}

Hit Time (HT)
- Time to deliver block in cache to CPU
Miss Penalty (MP)
- Additional time required from querying the main memory, fetching the block, and returning to cache because of a cache miss

\text{Miss Rate} = \frac{Misses}{\text{Total Accesses}} = 1 - \text{Hit Rate}

Miss Rate (MR)
- Percentage of times the CPU does not find the needed data in cache

For Questions that require you to find the MR: 1) Find the number of sets and vlocks per set of the cache

2) Find out the datatype and how many can of that datatype can fit on the block 3) If there is a loop, how many iterations of the loop, based on the resulting number in '2'?

4) In each iteration, find out how many memory operations (reads + writes) and how many cache misses 5) Repeats of loop * misses / total memory ops * loops

AMAT
- Average time to access memory considering both hits and misses

\text{AMAT = HT + MR * MP}

AMAT can be expressed in clock cycles, or picoseconds.

*Refer to Slide 31 of slides Caching Part 1.

To have the most significant decrease in AMAT, first reduce MR, then MP, then clock cycle.

Calculating Multi Cache AMAT
- L1 hit time + L1 miss rate x (L2 hit time + L2 miss rate x [Ln hit time + Ln miss rate * MP])

L1 and L2 Cache

L1 is closest to CPU, followed by L2
Adding an extra layer of cache improves performance by reducing AMAT

Mapping Policy

Block Size (K) -> allows one to find the Offset bits
- Unit of transfer from MM to cache
Cache Size -> allows one to find out the Index
- Amount of data in bytes that the cache can store
Tag
- Where block is from MM.

Find the nearest starting block address to be able to find what other addresses are loaded along with the requested address.

Index
- May map to one or multiple blocks
  - Direct mapped cache (E=1)
    when each memory address maps to exactly one index
    conflict in cache often happens
  - Fully associative cache
    8-way: meaning that there are 8 blocks in 1 set.
  - Set associative cache (or E-way set)/ Associavity
    E is defined to be either 1, 2, 4 or 8, where the number dictates whether how many blocks there are in each set.
    Increasing the E always decreases conflict misses and increases miss rate.
    Think -> increasing E increases depth, so DFS vs BFS
    *However, you need to consider the access patterns
    If all values can be accessed in the same block, then E-way does not impact.

Question on largest/smallest memory range that starts at (address): 1) Calculate which set it is in 2) Find if there are any more E-ways, and depending on E-way number, calculate the largest memory range.

Collision

When different addresses map to the same cache index

Reading Tips

Read Little Endian
- ie. D3C2 (byte 3, byte 2) , not C2D3 (byte 2, byte3), when address request is from byte2.
If no valid tag, cache miss. No return bytes and one line is evicted according to replacement policy

Setup of the Table

Set

Line 0

Recently Used (RU)

Recent

Tag

Byte0

Byte 1

Replacement Policy

Replacement is needed when there is fully-associative sets/E-way sets that are full and require eviction of one block so that there is space for the newly request block.

FIFO
- Oldest block is replaced
LRU
- Least recently used
- Best but hard to implement

Random

Cache Write-Hit

1) Write-Through

Immediately updates MM with change when cache is also updated
Slow

2) Write-back

Cache keeps new data, does not update MM immediately
- Ie. When adding new data to a block, the block is a marked as "dirty" with the dirty bit in the cache.
MM is updated only when the cache block is replaced/evicted so that other memory can be freed for others
- Uses dirty bit to track which have been modified
Faster, but they may be out of sync for a while

Cache when Write-Miss

When data you want to change is not in the memory

1) Write-Allocate

Loads data from MM to cache
Write new data to cache
Pro: Good for mass reads and writes

2) No-Write-Allocate

Directly writes to MM
Cache not updated
Pro: Good for data that is uncommon

Cache Miss

Virtual Memory

Memory management technique
Allows programs to act as if there is more physical memory than there actually is in the computer
- Does this via pagination, where they organise data into pages ( large, fixed-size blocks of memory)
  - A page is larger than a cache block size
  - Fully associative and Write-back.
  - Complicated replacement policy
Each process has its own page table, which translates virtual address to physical address.
Each virtual page is not mapped to a physical page

Address Translation for VM

Each memory access requires translating a virtual address (VA) into a physical address (PA).
Processor allocates the virtual address and passes it onto the MMU, where it does a translation to provide a physical address in main memory.
- For translation, it uses a lookup table called the
  - Indexed by VPN, where there is an entry for every virtual page
  - Each PT entry stores PPN and management bits (Valid, Dirty)
- In the PT table
- MMU then sends physical address to
In processor, virtual address can be split into
- +
- Virtual Page Number
  - Phyiscal Page Size/Physical Page Offset/Virtual Page Offset

Page Hit

VM reference is in physical memory

Page Fault

VM reference is NOT in physical memory, which causes an exception
Page Fault handler then selects a victim to be evicted.
Instruction is then restarted to get the page hit

Translation Lookaside Buffer (TLB)

A cache, stored in MMU
- Caches for page table.
Stores frequently used virtual-to-physical address mappings
If TLB miss, CPU must access page table in memory

Previoustheories Nextone-zero-five

Last updated 2 months ago