



## Next: Review Computer Organization in an hour!

- Overview of Computer Organization
  - Components

9

Sample processor design process

## **Review: Computer Organization Basics**

- What are the components of a CPU
- What is the microarchitecture level ?
- What is an ISA Instruction set architecture ?
- How does a sample processor design look ?
  - > A simple processor architecture
- · what is the basic concept of pipelining

CS 211: Bhagi Narahari,CS, GWU

CS 211: Bhagi Narahari,CS, GWU

3











#### Input/Output (1/0) Devices

- I/O devices are the components of a computer system that accepts data to be processed and presents the results of the processing
- Input Device Examples
  - Keyboard
  - Mouse
- Output Device Examples
  - Video display
  - > Printer
- We won't touch on this in this course!

CS 211: Bhagi Narahari,CS, GWU

9

## **The Operating System**

- The Operating System (OS) is a set of programs that manages all of the computer's resources
- Unix, Linux, Windows 98, Me, NT, 2000, XP, and MacOS are all examples of modern operating systems
- Not the focus of this course!

9

## **Computer Architecture**

- The computer architecture encompasses the user's view of the computer.
- This includes such things as the assembly language instruction set, the number and types of internal registers, the memory management system and the model for exception handling.



• Memory holds data, instructions.

CS 211: Bhagi Narahari,CS, GWU

- Central processing unit (CPU) fetches instructions from memory.
  - Separate CPU and memory distinguishes programmable computer.
- CPU registers help out: program counter (PC), instruction register (IR), generalpurpose registers, etc.









S

CS 211: Bhagi Narahari,CS, GWU

• The Instruction Set Architecture (ISA) describes a set of instructions whose syntactic and semantic characteristics are defined by the underlying computer architecture.





















- 1978: The Intel 8086 is announced (16 bit architecture)
- 1980: The 8087 floating point coprocessor is added
- 1982: The 80286 increases address space to 24 bits, +instructions
- 1985: The 80386 extends to 32 bits, new addressing modes
- 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (mostly designed for higher performance)
  1997: 57 new "MMX" instructions are added, Pentium II
- 1999: The Pentium III added another 70 instructions (SSE)
- 2001: Another 144 instructions (SSE2)
- 2001: Anomer 144 instructions (Soc2)
   2003: AMD extends the architecture to increase address space to 64 bits, widens all registers to 64 bits and other changes (AMD64)
   2004: Intel capitulates and embraces AMD64 (calls it EM64T) and adds more media extensions •
- "This history illustrates the impact of the "golden handcuffs" of compatibility -"adding new features as someone might add clothing to a packed bag"

-"an architecture that is difficult to explain and impossible to love"

CS 211: Bhagi Narahari,CS, GWU

S

## 3 **IA-32 Overview** • Complexity: Instructions from 1 to 17 bytes long > one operand must act as both a source and destination > one operand can come from memory complex addressing modes e.g., "base or scaled index with 8 or 32 bit displacement" • Saving grace: > the most frequently used instructions are not too difficult to build compilers avoid the portions of the architecture that are slow "what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective"

CS 211: Bhagi Narahari.CS. GWU



















cs211;Bitargetiaddress: target address of the jump instruction

Step 1a: The MIPS-Inst Set (eg.) 11 26 16 • ADD and SUB rd shamt funct op rs rt > addU rd, rs, rt 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits subU rd, rs, rt 31 26 21 16 OR Immediate: rs rt immediate > ori rt, rs, imm16 <sup>6 bits</sup> 26 5 bits 5 bits 16 bits 16 0 . LOAD and STORE Word rs rt immediate 5 bits 5 bits 16 bits Iw rt, rs, imm16 21 > sw rt, rs, imm1<sup>a</sup>/<sup>b</sup>/<sub>b</sub> op rs rt immediate BRANCH: 5 bits • 6 bits 5 bits 16 bits > beg rs, rt, imm16 · Register rs and rt are the source registers. • If the instruction has three operand register, then rd is the destination register • If the instruction has two operand register, then rt is the destination register CS 211: Bhagi Narahari,CS, GWU































| inst  | Register Transfer                                                                                 |                                   |
|-------|---------------------------------------------------------------------------------------------------|-----------------------------------|
| ADD   | R[rd] <- R[rs] + R[rt];                                                                           | PC <- PC + 4                      |
|       | ALUsrc = RegB, ALUctr = "add", RegDst = rd                                                        | , RegWr, nPC_sel = "+4"           |
| SUB   | R[rd] <- R[rs] - R[rt];                                                                           | PC <- PC + 4                      |
|       | ALUsrc = RegB, ALUctr = "sub", RegDst = rd,                                                       | RegWr, nPC_sel = "+4"             |
| ORi   | $R[rt] <- R[rs] + zero\_ext(Imm16);$                                                              | PC <- PC + 4                      |
|       | ALUsrc = Im, Extop = "Z", ALUctr = "or", Re                                                       | egDst = rt, RegWr, nPC_sel = "+4" |
| LOAD  | R[rt] <- MEM[ R[rs] + sign_ext(Imm16)];                                                           | PC <- PC + 4                      |
|       | ALUsrc = Im, Extop = "Sn", ALUctr = "add",<br>MemtoReg, RegDst = rt, RegWr,                       | nPC_sel = "+4"                    |
| STORE | MEM[ R[rs] + sign_ext(Imm16)] <- R[rs];                                                           | PC <- PC + 4                      |
|       | ALUsrc = Im, Extop = "Sn", ALUctr = "add",                                                        | MemWr, nPC_sel = "+4"             |
| BEQ   | if ( $R[rs] == R[rt]$ ) then $PC <\!\!-PC + sign_ext(Imm16)] \parallel 00$ else $PC <\!\!-PC + 4$ |                                   |
|       | nPC_sel = EQUAL, ALUctr = "sub"                                                                   |                                   |







# Summary

#### 5 steps to design a processor •

- > 1. Analyze instruction set => datapath requirements
- > 2. Select set of datapath components & establish clock methodology
- > 3. Assemble datapath meeting the requirements
- A. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
- > 5. Assemble the control logic
- MIPS makes it easier
- Instructions same size
- > Source registers always in same place
- Immediates same size, location
- > Operations always on registers/immediates
- Single cycle datapath => CPI=1, CCT => long

CS 211: Bhagi Narahari,CS, GWL

CS 211: Bhagi Narahari,CS, GW

## **Systematic Generation of Control** Control Logic / Store (PLA, ROM) OPcode Decode microinstruction Conditions Instruction Control Points Datapath In a single-cycle processor, each instruction is realized by exactly one control command or "microinstruction" • > in general, the controller is a finite state machine

cs 21178h microinstruction can also control sequencing (see later)

| what's wrong with our CPI=1 processor?                |  |  |  |
|-------------------------------------------------------|--|--|--|
| Arithmetic & Logical                                  |  |  |  |
| PC Inst Memory Reg File mux ALU mux setup             |  |  |  |
| Load                                                  |  |  |  |
| PC Inst Memory Reg File mux ALU Data Mem muxsetup     |  |  |  |
| Critical Path Store                                   |  |  |  |
| PC Inst Memory Reg File mux ALU Data Mem              |  |  |  |
| Branch PC Inst Memory Reg File cmp mux                |  |  |  |
|                                                       |  |  |  |
| Long Cycle Time                                       |  |  |  |
| All instructions take as much time as the slowest     |  |  |  |
| Bool moments in not an pice on our idealized moments  |  |  |  |
| • Real memory is not as nice as our idealized memory  |  |  |  |
| > carnot always get the job done in one (short) cycle |  |  |  |















- If simple instruction could execute at very high clock rate...
  - you could even write compilers to produce microinstructions...
- If most programs use simple instructions and addressing modes...
- If microcode is kept in RAM instead of ROM so as to fix bugs ...
- Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1)

















- Recall performance is function of
   > CPI: cycles per instruction
  - Clock cycle

CS 211: Bhagi Narahari,CS, GWU

- Instruction count
- Reducing any of the 3 factors will lead to improved performance



CS 211: Bhagi Narahari,CS, GWU





CS 211: Bhagi Narahari,CS, GWU