### Lecture 15: Improving Cache Performance

- · Last Time:
  - Cache introduction
    - · Average Memory Access Time (AMAT)
    - Set associativity
- Today
  - Replacement and write policies
  - Cache performance optimizations

UTCS CS352, S07 Lecture 15



# Bookshelf analogy

- · Lots of books on shelves
- A few books on my desk
- · One book that I'm reading

UTCS CS352, S07 Lecture 15



# Direct Mapped

• Each block mapped to exactly 1 cache location

Cache location = (block address) MOD (# blocks in cache)



# Fully Associative

• Each block mapped to any cache location

Cache location = any



### Set Associative

• Each block mapped to subset of cache locations

Set selection = (block address) MOD (# sets in cache)



UTCS Lecture 15
CS352, S07

# How do we use memory address to find block in the cache?

UTCS CS352, S07 Lecture 15

### How Do We Find a Block in The Cache?

- Our Example:
  - Main memory address space = 32 bits (= 4GBytes)
  - Block size = 4 words = 16 bytes
  - Cache capacity = 8 blocks = 128 bytes



- index  $\Rightarrow$  which set
- tag ⇒ which data/instruction in block
- · block offset ⇒ which word in block
- # tag/index bits determine the associativity
- · tag/index bits can come from anywhere in block address

UTCS Lecture 15 CS352, S07

# Finding a Block: Direct-Mapped Sentries Sentries Address With cache capacity = 8 blocks UTCS CS352, S07

# Finding A Block: 2-Way Set-Associative



# Set Associative Cache

- S sets
- · A elements in each set
  - A-way associative
- In the example, S=4, A=2
  - 2-way associative 8-entry cache

UTCS CS352, S07 Lecture 15

### Set Associative Cache - cont'd

- All of main memory is divided into S sets
  - All addresses in set N map to same set of the cache
    - · Addr = N mod S
    - A locations available
- Shares costly comparators across sets
- Low address bits select set
  - 2 in example
- High address bits are tag, used to associatively search the selected set
- Extreme cases
  - A=1: Direct mapped cache
  - S=1: Fully associative
- A need not be a power of 2

UTCS Lecture 15 CS352, S07



# What is the purpose of the valid bit?



• Hint: what happens when the machine boots up?

UTCS CS352, S07 Lecture 15

15

# Questions to think about

- As the block size goes up, what happens to the miss rate?
- · ... what happens to the miss penalty?
- · ... what happens to hit time?
- As the associativity goes up, what happens to the miss rate?
- · ... what happens to the hit time?

UTCS CS352, S07 Lecture 15





- How do we find it? DONE
- Which one do we replace when a new one is brought in?
- What happens on a write?

UTCS Lecture 15 18 CS352, S07

### Which Block Should Be Replaced on Miss?

- · Direct Mapped
  - Choice is easy only one option
- Associative
  - Randomly select block in set to replace
  - Least-Recently used (LRU)
- Implementing LRU
  - 2-way set-associative
  - >2 way set-associative

UTCS CS352, S07 Lecture 15

19

### What Happens on a Store?

- Need to keep cache consistent with main memory
  - Reads are easy no modifications
  - Writes are harder when do we update main memory?
- Write-Through
  - On cache write always update main memory as well
  - Use a write buffer to stockpile writes to main memory for speed
- Write-Back
  - On cache write remember that block is modified (dirty bit)
  - Update main memory when dirty block is replaced
  - Sometimes need to flush cache (I/O, multiprocessing)

UTCS CS352, S07 Lecture 15

### BUT: What if Store Causes Miss!

- · Write-Allocate
  - Bring written block into cache
  - Update word in block
  - Anticipate further use of block
- · No-write Allocate
  - Main memory is updated
  - Cache contents unmodified

UTCS CS352, S07 Lecture 15

21

# Improving cache performance

UTCS CS352, S07 Lecture 15

# Three kinds of cache misses

- Compulsory misses
  - First time data is accessed
- · Capacity misses
  - Working set larger than cache size
- · Conflict misses
  - One set fills up, but room in other sets

UTCS CS352, S07 Lecture 15

23

# How Do We Improve Cache Performance?

$$AMAT = t_{hit} + p_{miss} \bullet penalty_{miss}$$

UTCS CS352, S07 Lecture 15

# How Do We Improve Cache Performance?

$$AMAT = t_{hit} + p_{miss} \bullet penalty_{miss}$$

- · Reduce miss rate
- · Reduce miss penalty
- · Reduce hit time

UTCS CS352, S07 Lecture 15

25

# Reducing Miss Rate: Increase Block Size

- · Fetch more data with each cache miss
  - 16 bytes  $\Rightarrow$  64, 128, 256 bytes!
  - Works because of Locality (spatial)



# Reducing Miss Rate: Increase Associativity

- · Reduce conflict misses
- Rules of thumb
  - 8-way = fully associative
  - Direct mapped size N = 2-way set associative size N/2
- But!
  - Size N associative is larger than Size N direct mapped
  - Associative typically slower that direct mapped (thit larger)

UTCS CS352, S07 Lecture 15

27

# Reduce Miss Penalty: More Cache Levels

- Average access time = HitTime<sub>L1</sub> + MissRate<sub>L1</sub> \* MissPenalty<sub>L1</sub>
- MissPenalty<sub>L1</sub> =
   HitTime<sub>L2</sub> + MissRate<sub>L2</sub> \* MissPenalty<sub>L2</sub>
- · etc.
- · Size/Associativity of higher level caches?



### Reduce Miss Penalty: Transfer Time

- Wider path to memory
  - Transfer more bytes/cycle
  - Reduces total time to transfer block
- · Two ways to do this:
  - Wider path to each memory
  - Separate paths to multiple memories ("multiple memory banks")

UTCS CS352, S07 Lecture 15

29

# Reducing Hit Time

- · Make Caches small and simple
  - Hit Time = 1 cycle is good (3.3ns!)
  - L1 low associativity, relatively small
- · Even L2 caches can be broken into sub-banks
  - Can exploit this for faster hit time in L2

UTCS CS352, S07 Lecture 15



# Reducing Miss Rate: Prefetching

- Fetching Data that you will probably need
- Instructions
  - Alpha 21064 on cache miss
    - · Fetches requested block intro instruction stream buffer
    - · Fetches next sequential block into cache
- Data
  - Automatically fetch data into cache (spatial locality)
  - Issues?
- Compiler controlled prefetching
  - Inserts prefetching instructions to fetch data for later use
  - Registers or cache

UTCS Lecture 15 32 CS352, S07

# Reduce Miss Penalty: Deliver Critical word first

· Only need one word from block immediately



- Don't write entire word into cache first
  - Fetch word 2 first (deliver to CPU)
  - Fetch order: 2 3 0 1

UTCS CS352, S07 Lecture 15

33

# Reduce Miss Penalty: Read Misses First

· Let reads pass writes in Write buffer



# Reduce Miss Penalty: Lockup Free Cache

Let cache continue to function while miss is being serviced



# Summary

- · Recap
  - Using a memory address to find location in cache
  - Deciding what to evict from the cache
  - Improving cache performance
- Next time:
  - Memory system and Virtual Memory
  - Read P&H 7.4 7.9

UTCS Lecture 15 36 CS352, S07