CS 211- Homework 4

Homework 4: CS 211 Fall 2008

Ques. 1:

(a) A 64KB, direct mapped cache has 16 byte blocks. If addresses are 32 bits, how many bits are used the tag, index, and offset in this cache?
(b) How would the address be divided if the cache were 4-way set associative instead?
(c) How many bits is the index for a fully associative cache. Explain your answer.

Ques.2: An 8 byte, 2-way set associative (using LRU replacement) with 2 byte blocks receives requests for the following addresses (represented in binary):

0110, 0000, 0010, 0001, 0011, 0100, 1001, 0000, 1010, 1111, 0111

For each access, determine the address in the cache (after the access), whether each access hits or misses, and the categorization of each miss under the “3 C” model. Fill in the worksheet in the format shown below with your answer to this question (Note that the first access is done for you). You should fill in the cache lines with the tags that

reside there, with the most recently used tag first.

Address	Line 0	Line 1	Hit or Miss Type
0110	(empty)	01/(empty)	Compulsory Miss
0000
0010
….

Ques.3: Smith and Goodman is their research, found that for a given small size, a direct mapped instruction cache consistently outperformed a fully associative cache using LRU replacement. Explain how this would be possible (note that you cannot explain using the 3C’s model because the model ignores replacement policy).

Ques.4: Assume you have a processor with an ideal CPI without memory stalls for each instruction type as follows: ALU=1, Load/Store=1.5, Branch=1.5, Jump=1.

Consider an application which has an instruction mix of 40% ALU and logical operations, 30% load and store, 20% branch and 10% jump instructions.

(a) Assume a 4-way set associative 1-level separate data and instruction cache with a miss rate of 20% (0.20) for data accesses and miss rate of 10% ( 0.10) for instructions, and a miss penalty of 50 cycles for both instruction and data caches (and assume a cache hit takes 1 cycle). What is the effective CPU time (or effective CPI with memory stalls) and the average memory access time for this application with this Cache organization ?

(b) Now consider a 2 level 4-way unified cache with a level l (L1) miss rate of 20% (0.20) and a level 2 (L2) local miss rate of 30% (0.30). Assume hit time in L1 is 1 cycle, assume miss penalty is 10 cycles if you miss in L1 and hit in L2 (i.e., hit time in L2 is 10

cycles), and assume miss penalty is 50 cycles if you miss in L2 (i.e., miss penalty in L2 is 50 cycles). Derive the equation for the effective CPU time (or effective CPI) and the average memory access time for the same instruction mix as part (a) for this cache organization.

Which of the two designs (between part a and part b) gives a better performance ? Explain your answer.