one-zero-six
Intro to 106
-128 to 128
0 to 255
Left most bit is used to denote positive or negative
All bits are used for the number
-2^n-1 to 2^n-1
(2^n)-1
Always consider signed component - especially when comparing two numbers; consider whether if one is signed, and the other is not. To calculate signed from using 2's complement: 1) Invert the bits 2) Add 1 to the binary
funny expressions
-2 < 1U etc etc
You evaluate -2 as a signed.
1U is to be evaluated as a unsigned.
Signed with Unsigned
⇒ Treat every number as unsigned, then evaluate
Signed with Signed
Evaluate as it is
Unsigned with unsigned
Evaluate as it is
In the cheat sheet:
0xAA
short
0xAAAA
int
0xAAAA AAAA
long
0xAAAA AAAA AAAA AAAA
Sorting questions:
1) Find who's signed and unsigned (8 to F)
2) Sort
Bits and Bytes
2's complement allows us to convert subtraction into addition, which simplifies hardware design.
Ie. Addition (A + B)
Ie. Subtraction (A + ~B)
A adds negation B, which can be simply done by flipping the bits of B and adding 1 to create ⇒ ~B
Shifting
Denoted by >> and <<
Always shift in bits. << 8 means shift 8 bits, which is 2 hex bytes.
If there is a shift of << then >>, it is extracting the rightmost byte.
Logical vs Aritmatic Shift
For unsigned integers
For signed integers
For literal shifting, no extension or anything
For dividing (right shift >>), multiplying (left shift <<) by 2 Sign extends the MSBit
Overflow
Does it leave the range of the given datatype?
Bit Masks
Checking for MSB/LSB
B can stand for bit or byte here
Assembly
Most instructions will update the conditional flags, and some will NOT store the results in the destination register/memory address
leaq does not update flags, and also does not store results
cmpq is similar to subq; not addq
Does update flags, but not store results
testq is similar to a bitwise AND
Also update flags, and does not store results
Little Endian
Smallest byte ('Least Significant Byte') ends up in the lowest address
Ie. [Byte 2] [Byte 3] [Byte 4]
>> [Byte 4] [Byte 3] [Byte 2]
Big Endian
First byte ('Most significant byte') ends up in the highest address
Ie. [Byte 2] [Byte 3] [Byte 4]
>> [Byte 2] [Byte 3] [Byte 4]
Always assume little endian on x86-64 ISA
Caching
Only a program can have temporal locality and spacial locality.
Good temporal locality + Spatial locality:
Bad temporal locality:
Program reuses data too late ⇒ Bad temporal locality
When checking temporal locality -> Are there repeated accesses to the same data?
Processor-Memory Bottleneck
Core 2 Duo (CPU + Reg)
Process 256 bytes/cycle
Bus bandwidth is only 2 bytes/cycle from main memory
⇒ So, lots of waiting on memory and the bus to transport memory to the Core 2 Duo for processing (also called slow memory access)
Cache
Small and fast storage device (compared to Main Memory)
Acts as a stage area for a subset of frequently or recently used instructions or data
Instructions are in i-cache
Data are in d-cache
Main Memory-Cache Interaction
Data is copied in blocks from Main Memory to Cache
Cache Hit
When CPU calls for block n, and block n is in cache.
Cache Miss
When CPU calls for block n, and block n is not in cache.
So, a request is then made to main memory to fetch block n
Here, based on the Mapping Policy of Cache, it decides where block n will be placed in the cache
Replacement Policy of Cache adds onto the mapping policy. If there is not free space left, this policy decides which block to evict in cache so that block n can be placed in cache for CPU access.
Cache Storage
Is faster than main memory, as when a program runs, it accesses memory in a way that follows patterns that predict what data the program will need next and store it in advance
Temporal pattern/locality (Time-based)
If program uses data now, program will likely use it again soon
likely be referenced
Spatial pattern/locality (Nearby)
If program accesses one piece of memory, it will likely access nearby memory
tend to be referenced
Stride-1 access is good spatial locality
Cache Performance Metrics
Hit Time (HT)
Time to deliver block in cache to CPU
Miss Penalty (MP)
Additional time required from querying the main memory, fetching the block, and returning to cache because of a cache miss
Miss Rate (MR)
Percentage of times the CPU does not find the needed data in cache
AMAT
Average time to access memory considering both hits and misses
To have the most significant decrease in AMAT, first reduce MR, then MP, then clock cycle.
Calculating Multi Cache AMAT
L1 hit time + L1 miss rate x (L2 hit time + L2 miss rate x [Ln hit time + Ln miss rate * MP])
L1 and L2 Cache
L1 is closest to CPU, followed by L2
Adding an extra layer of cache improves performance by reducing AMAT
Mapping Policy
Block Size (K) -> allows one to find the Offset bits
Unit of transfer from MM to cache
Cache Size -> allows one to find out the Index
Amount of data in bytes that the cache can store
Tag
Where block is from MM.
Index
May map to one or multiple blocks
Direct mapped cache (E=1)
when each memory address maps to exactly one index
conflict in cache often happens
Fully associative cache
8-way: meaning that there are 8 blocks in 1 set.
Set associative cache (or E-way set)/ Associavity
E is defined to be either 1, 2, 4 or 8, where the number dictates whether how many blocks there are in each set.
Increasing the E always decreases conflict misses and increases miss rate.
Think -> increasing E increases depth, so DFS vs BFS
*However, you need to consider the access patterns
If all values can be accessed in the same block, then E-way does not impact.
Collision
When different addresses map to the same cache index
Reading Tips
Read Little Endian
ie. D3C2 (byte 3, byte 2) , not C2D3 (byte 2, byte3), when address request is from byte2.
If no valid tag, cache miss. No return bytes and one line is evicted according to replacement policy
Setup of the Table
Recently Used (RU)
Recent
Tag
Byte0
Byte 1
0
1
Replacement Policy
Replacement is needed when there is fully-associative sets/E-way sets that are full and require eviction of one block so that there is space for the newly request block.
FIFO
Oldest block is replaced
LRU
Least recently used
Best but hard to implement
Random
Cache Write-Hit
1) Write-Through
Immediately updates MM with change when cache is also updated
Slow
2) Write-back
Cache keeps new data, does not update MM immediately
Ie. When adding new data to a block, the block is a marked as "dirty" with the dirty bit in the cache.
MM is updated only when the cache block is replaced/evicted so that other memory can be freed for others
Uses dirty bit to track which have been modified
Faster, but they may be out of sync for a while
Cache when Write-Miss
When data you want to change is not in the memory
1) Write-Allocate
Loads data from MM to cache
Write new data to cache
Pro: Good for mass reads and writes
2) No-Write-Allocate
Directly writes to MM
Cache not updated
Pro: Good for data that is uncommon
Cache Miss
Virtual Memory
Memory management technique
Allows programs to act as if there is more physical memory than there actually is in the computer
Does this via pagination, where they organise data into pages ( large, fixed-size blocks of memory)
A page is larger than a cache block size
Fully associative and Write-back.
Complicated replacement policy
Each process has its own page table, which translates virtual address to physical address.
Each virtual page is not mapped to a physical page
Address Translation for VM
Each memory access requires translating a virtual address (VA) into a physical address (PA).
Processor allocates the virtual address and passes it onto the MMU, where it does a translation to provide a physical address in main memory.
For translation, it uses a lookup table called the
Indexed by VPN, where there is an entry for every virtual page
Each PT entry stores PPN and management bits (Valid, Dirty)
In the PT table
MMU then sends physical address to
In processor, virtual address can be split into
+
Virtual Page Number
Phyiscal Page Size/Physical Page Offset/Virtual Page Offset
Page Hit
VM reference is in physical memory
Page Fault
VM reference is NOT in physical memory, which causes an exception
Page Fault handler then selects a victim to be evicted.
Instruction is then restarted to get the page hit
Translation Lookaside Buffer (TLB)
A cache, stored in MMU
Caches for page table.
Stores frequently used virtual-to-physical address mappings
If TLB miss, CPU must access page table in memory
Last updated