















- Many types of memory with different speeds
- Processor speed and memory speed mismatched
  - Data transferred between memory and processor
    Instructions or data
  - What does processor do while waiting for data to be transferred ?
  - > Idle processor is stalled leading to slowdown in speed and lower performance
- Why can't we have memory as fast as processor > Technology, cost, size
- What is the solution then ?

CS 135: Computer Architecture, Bhagi Narahar

S

٠

•



 They suggest an approach for organizing memory and storage systems known as a memory hierarchy.























 If each access to memory leads to a cache hit then time to fetch from memory is one cycle

> Program performance is good!

- If each access to memory leads to a cache miss then time to fetch from memory is much larger than 1 cycle
  - > Program performance is bad!
- Design Goal:
- How to arrange data/instructions so that we have as few cache misses as possible.





| SRAM | vs DR | AM Su | immary |
|------|-------|-------|--------|
|      |       |       |        |

|      | Tran.<br>per bit | Access<br>time | Persist? | Sensitive? | Cost | Applications                    |
|------|------------------|----------------|----------|------------|------|---------------------------------|
| SRAM | 6                | 1X             | Yes      | No         | 100x | cache memories                  |
| DRAM | 1                | 10X            | No       | Yes        | 1X   | Main memories,<br>frame buffers |



- > The address can be in any mod
- Need to figure out which one





































Simple Model of Memory Hierarchy...

Depends where it is in the Memory hierarchy









## 









- Average memory access time with and without cache ?
- AMAT-cache = 1 + miss ratio \* miss penalty
  - ▶ 1+ (0.04)\*50 = 3
- AMAT-without-cache = 50
- What happens if miss ratio increases ?





- CPI is number of cycles for CPU execution per instruction
- Each instruction stalls for some number of . cycles
- IC \* (CPI + Avg Stall cycles)\*Clock •
  - IC\*(1.5+2.4)\*clock = IC\*3.9 clock cycles



Example 2

- Processor generates N bit address
- How do we look at this N bit address and decide (a) if it is in cache and (b) where to place it in cache ?







16 byte memory,

With 4 byte sized

Cache blocks;

4 blocks of memory



- If each block has only one place it can appear in the cache, it is said to be "direct mapped"
   and the mapping is usually (Block address) MOD
  - (Number of blocks in the cache)

S

- If a block can be placed anywhere in the cache, it is said to be <u>fully associative</u>
- If a block can be placed in a restrictive set of places in the cache, the cache is <u>set associative</u>.
  - A set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set.
     (Block address) MOD (Number of sets in the cache) if there are n blocks in a set, the cache is called <u>n-way</u> set associative

CS 135: Computer Architecture, Bhagi Narahar





## 17



































CS 135: Computer Architecture, Bhagi Narahari



Principle of locality!
 Exploit at each level.























|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
|------|------------------|--------|-------|------|-------|-------|-----------|
| SRAM | \$/MB            | 19,200 | 2,900 | 320  | 256   | 100   | 190       |
|      | access (ns)      | 300    | 150   | 35   | 15    | 2     | 100       |
|      |                  |        |       |      |       |       |           |
|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
| RAM  | \$/MB            | 8,000  | 880   | 100  | 30    | 1     | 8,000     |
|      | access (ns)      | 375    | 200   | 100  | 70    | 60    | 6         |
|      | typical size(MB) | 0.064  | 0.256 | 4    | 16    | 64    | 1,000     |
|      | metric           | 1980   | 1985  | 1990 | 1995  | 2000  | 2000:1980 |
|      | \$/MB            | 500    | 100   | 8    | 0.30  | 0.05  | 10,000    |
| Disk | access (ms)      | 87     | 75    | 28   | 10    | 8     | 10,000    |
|      | typical size(MB) | 1      | 10    | 160  | 1,000 | 9.000 | 9,000     |

|                                                | CPU Clock Rates    |                 |                 |                  |                     |            |
|------------------------------------------------|--------------------|-----------------|-----------------|------------------|---------------------|------------|
|                                                | 1980               | 1985            | 1990            | 1995             | 2000                | 2000:1980  |
| processor<br>clock rate(MHz)<br>cycle time(ns) | 8080<br>1<br>1,000 | 286<br>6<br>166 | 386<br>20<br>50 | Pent<br>150<br>6 | P-III<br>750<br>1.6 | 750<br>750 |
|                                                |                    |                 |                 |                  |                     |            |
|                                                |                    |                 |                 |                  |                     |            |
|                                                |                    |                 |                 |                  |                     |            |
|                                                |                    |                 |                 |                  |                     |            |













