Basic/Pipelined processor Implementation

Jinyang Li

### Today's lesson plan

- Basic single-cycle CPU design, continued
- Pipelining idea and challenges

### Recall our basic RISC-V CPU: datapath w/ control



## R-Type Inst





add x5, x6, x7

## **BEQ Instruction**

| rs2 | rs1 | funct3 | opcode |
|-----|-----|--------|--------|

beq x5, x6, 100



|      |           |     |        | ld x5, | 40(x6) |
|------|-----------|-----|--------|--------|--------|
| Inst | immediate | rs1 | funct3 | rd     | opcode |



Load

### Basic CPU must finish an instruction in one clock cycle → use a "slow" clock



Clock cycle >= 800 ps

## Our basic design is slow

- Longest delay determines clock period
  - Critical path: load instruction
  - Instruction memory  $\rightarrow$  register file  $\rightarrow$  ALU  $\rightarrow$  data memory  $\rightarrow$  register file
- Not feasible to vary clock period for different instructions
- Next: improve performance by pipelining

## Pipelining: a laundry analogy



### **RISC-V** Pipeline

#### • Five stages:

- 1. IF: Instruction fetch from memory
- 2. ID: Instruction decode & register read
- 3. EX: Execute operation or calculate address
- 4. MEM: Access memory operand
- 5. WB: Write result back to register

## **Pipeline Performance**



## **Pipeline Speedup**

- Pipelining increases throughput (instructions/sec)
  - Latency (time for each instruction) does not decrease
- If all stages are balanced (i.e., all take the same time)
  - throughput<sub>pipelined</sub> = number-of-stages \* throughput<sub>nonpipelined</sub>
  - If not balanced, speedup is less

Throughput = 1/(time between instructions)

## **Pipelining and ISA Design**

- RISC-V ISA is designed for pipelining
  - All instructions are 32-bits
    - Easier to fetch and decode in one cycle
    - c.f. x86: 1- to 17-byte instructions
  - Few and regular instruction formats
    - Can decode and read registers in one step
  - Load/store addressing
    - Can calculate address in 3<sup>rd</sup> stage, access memory in 4<sup>th</sup> stage

## Pipeline challenges: hazards

- Situations that prevent starting next instruction in the next cycle
- Structure hazard
  - A required resource is busy
- Data hazard
  - Need to wait for previous instruction to complete its write
- Control hazard
  - Which instruction to execute depends on previous instruction

#### Structure Hazards

- Conflict use of a single resource
- Example: Suppose CPU uses a single memory
  - Load/store requires data access
  - Instruction fetch would have to *stall* for that cycle
    - Would cause a pipeline "bubble"
- Solution: Use separate instruction/data memories

#### Data Hazards

• An instruction depends on the previous instruction to complete its write



## Control hazard

• Wait until branch outcome is determined before fetching next instruction



## Control hazard

• Wait until branch outcome is determined before fetching next instruction



# A basic pipelined RISC-V CPU

## **Pipelined Datapath**



## Pipeline registers

#### : needed to hold data produced in previous cycle



## IF for Load, Store

ld

Instruction fetch

Single-clock-cycle diagram shows the state of an entire datapath during a single clock cycle



#### ID for Load, Store, ...

ld Instruction decode





#### **EX for Load**





#### **MEM for Load**



### WB for Load





#### **Corrected Datapath for Load**



### EX for Store





#### **MEM for Store**





### WB for Store



## Single-Cycle Pipeline Diagram

• State of pipeline in a given cycle



## Multi-Cycle Pipeline Diagram

#### • Traditional form

|                                                    | Time (in clock cycles) |      |      |      |      |      |      |      |      |
|----------------------------------------------------|------------------------|------|------|------|------|------|------|------|------|
|                                                    | CC 1                   | CC 2 | CC 3 | CC 4 | CC 5 | CC 6 | CC 7 | CC 8 | CC 9 |
| Program<br>execution<br>order<br>(in instructions) |                        |      |      |      |      |      |      |      |      |
|                                                    |                        | -    |      |      | _    | _    |      |      |      |

| ld x10, 40(x1)  | Instruction<br>fetch | Instruction decode   | Execution            | Data<br>access       | Write-back           |                    |                |                |            |
|-----------------|----------------------|----------------------|----------------------|----------------------|----------------------|--------------------|----------------|----------------|------------|
| sub x11, x2, x3 |                      | Instruction<br>fetch | Instruction decode   | Execution            | Data<br>access       | Write-back         |                |                |            |
| add x12, x3, x4 |                      |                      | Instruction<br>fetch | Instruction decode   | Execution            | Data<br>access     | Write-back     |                |            |
| ld x13, 48(x1)  |                      |                      |                      | Instruction<br>fetch | Instruction decode   | Execution          | Data<br>access | Write-back     |            |
| add x14, x5, x6 |                      |                      |                      |                      | Instruction<br>fetch | Instruction decode | Execution      | Data<br>access | Write-back |
| r               |                      |                      |                      |                      |                      |                    |                |                |            |

## Multi-Cycle Pipeline Diagram

#### • Form showing resource usage



### Summary

- Basic single-cycle CPU design
  - Data path vs. control path
  - Clock frequency is limited by the longest delay
- Basic 5-stage pipelined design:
  - Main idea: Parallel processing of different stages of an instruction's execution
  - RISC-V 5-stage pipeline (IF, ID, EXE, MEM, WB)
  - Pipeline hazards: structure, data, control