Csci 211 Computer System Architecture

Assignment 6

Due: Before class, April 21

Note: Make reasonable assumptions where necessary and clearly state them.

- A 64KB, direct mapped cache has 64 byte blocks. If addresses are 32 bits, how many bits are used the tag, index, and offset in this cache?
- How would the address be divided if the cache were 4-way set associative instead?
- How many bits is the index for a fully associative cache. Explain your answer.
An 8 byte, 2-way set associative (using LRU replacement) with 2 byte blocks receives requests for the following addresses (represented in binary): 0110, 0000, 0010, 0001, 0011, 0100, 1001, 0000, 1010, 1111, 0111 For each access, determine the address in the cache (after the access), whether each access hits or misses, and the categorization of each miss under the “3 C” model. Fill in the worksheet in the format shown below with your answer to this question. You should fill in the cache lines with the tags that reside there. (A cache line is another name for a cache block.)

Address Line 0 Line 1 Hit or Miss Type

0110

0000

0010

....
Smith and Goodman in their research, found that for a given small size, a direct mapped instruction cache consistently outperformed a fully associative cache using LRU replacement. Explain how this would be possible (note that you cannot explain using the 3C’s model because the model ignores replacement policy).
Assume you have a processor with an ideal CPI without memory stalls for each instruction type as follows: ALU=1, Load/Store=1.5, Branch=1.5, Jump=1. Consider an application which has an instruction mix of 40% ALU and logical operations, 30% load and store, 20% branch and 10% jump instructions.
- Assume a 4-way set associative 1-level separate data and instruction cache with a miss rate of 20% (0.20) for data accesses and miss rate of 10% ( 0.10) for instructions, and a miss penalty of 50 cycles for both instruction and data caches (and assume a cache hit takes 1 cycle). What is the effective CPU time (or effective CPI with memory stalls) and the average memory access time for this application with this Cache organization ?
- Now consider a 2 level 4-way unified cache with a level l (L1) miss rate of 20% (0.20) and a level 2 (L2) local miss rate of 30% (0.30). Assume hit time in L1 is 1 cycle, assume miss penalty is 10 cycles if you miss in L1 and hit in L2 (i.e., hit time in L2 is 10 cycles), and assume miss penalty is 50 cycles if you miss in L2 (i.e., miss penalty in L2 is 50 cycles). Derive the equation for the effective CPU time (or effective CPI) and the average memory access time for the same instruction mix as part (a) for this cache organization.
Which of the two designs (between part a and part b) gives a better performance ? Explain your answer.
Hardware pre-fetching was one of the techniques to improve cache performance. Hardware prefetching of cache lines can be viewed as a sort of prediction on which lines ought to be in the cache in the future. Sometimes the prefetching will be correct and will improve the performance while other times the prefetching will be incorrect and will hurt performance. Consider the following system with one level of cache. With no prefetching it exhibits (for some application) a hit time of 1 cycle, a hit rate of 95% and a miss penalty of 100 cycles. With prefetching (on this application, and with no change in hit times and miss penalties), we observe the following:
- 50% of memory accesses are better: hit rate = 97%
- 10% of memory accesses are worse: hit rate = 80%
- the rest of the memory accesses are unchanged.
With the parameters above, is it worth prefetching for this application?

Address	Line 0	Line 1	Hit or Miss Type
0110
0000
0010
....