raptor-chip

Microarchitecture (uarch)

Overview

Raptor is an out-of-order, single-issue RISC-V processor core with register renaming, a reorder buffer (ROB), reservation stations, and virtual memory support. The commit stage supports dual commit – up to 2 instructions can retire from the ROB per cycle when consecutive entries are both ready.

ISA: RV32/RV64 I + M (mul/div) + A (atomics: LR/SC, AMO) + C (compressed) + Zicsr + Zifencei + Sv32 MMU

The core supports configurable RV32 and RV64 modes via a compile-time switch (YSYX_RV64). When YSYX_RV64 is defined, XLEN=64 and all datapath, register file, AXI bus, and DPI-C interfaces widen to 64 bits. RV64 adds W-variant instructions (ADDIW, SLLIW, etc.) with 32-bit result sign-extension.

 Pipeline Overview -- 3 major partitions

 +- FRONTEND (in-order, speculative) ----------------------+
 | IFU  -- Instruction Fetch Unit (3-state FSM)            |
 |   +- L1I  (direct-mapped I-cache, TLB, Sv32 PTW, IFQ)   |
 |   +- BPU  (bimodal PHT, BTB, GHR, RSB)                  |
 | IDU  -- Instruction Decode Unit                         |
 |   +- RVC expansion + Chisel-generated decoders          |
 +---------------------------------------------------------+

 +- BACKEND (rename -> dispatch -> execute -> commit) -----+
 | RNU  -- Register Naming Unit (pure rename)              |
 |   +- RNQ     (rename queue, circular, RIQ_SIZE)         |
 |   +- Freelist (rnu_fl_if, circular FIFO, PHY_SIZE)      |
 |   +- MapTable (rnu_mt_if, spec MAP + committed RAT)     |
 | PRF  -- Physical Register File (top-level, 2R/2W)       |
 | ROU  -- Re-Order Unit (ROB + dispatch)                  |
 |   +- UOQ  (dispatch queue, circular, IIQ_SIZE)          |
 |   +- ROB  (rob_entry_t[], ROB_SIZE, head/tail)          |
 |   +- Operand bypass (EXU/IOQ -> dispatch)               |
 | EXU  -- Execution Unit (out-of-order)                   |
 |   +- RS   (reservation station, RS_SIZE)                |
 |   |   +- ALU  (RV32I combinational)                     |
 |   |   +- MUL  (RV32M, Booth's / iterative div)          |
 |   +- IOQ  (in-order queue, IOQ_SIZE, for ld/st/amo)     |
 | CMU  -- Commit Unit (broadcast retire info, in-order)   |
 | CSR  -- M/S-mode CSR file + trap/interrupt handling     |
 +---------------------------------------------------------+

 +- MEMORY SUBSYSTEM --------------------------------------+
 | LSU  -- Load/Store Unit                                 |
 |   +- STQ  (store temp queue, speculative, SQ_SIZE)      |
 |   +- SQ   (store queue, committed, SQ_SIZE)             |
 |   +- Store-to-load forwarding (CAM, virtual address)    |
 | L1D  -- Data Cache (2-way set-assoc, banked SRAM, RMW)  |
 |   +- Reservation register (LR/SC atomics)               |
 | TLB  -- Translation Lookaside Buffer (reusable module)  |
 |   +- ITLB (L1I), DTLB (L1D load), DSTLB (L1D store)     |
 | PTW  -- Page Table Walker (Sv32, reusable module)       |
 |   +- IPTW (L1I), DPTW (L1D)                             |
 | BUS  -- AXI4 master bridge (L1I/L1D arbitration)        |
 |   +- CLINT (mtime timer, periodic interrupt)            |
 +---------------------------------------------------------+

Pipeline Flow

The processor has a variable-depth pipeline with 8 logical stages in the common case (L1I hit, ALU instruction, no dependency stall). All stage boundaries use valid/ready handshaking; queues (RNQ, UOQ, RS, IOQ, ROB) decouple stages and absorb backpressure.

 IF0     IF1      ID      RN       DI       IS/EX    WB      CM
+-----++------++------++-------++--------++-------++-----++------+
|L1I  ||IFU   ||IDU   ||RNU    ||ROU     ||EXU    ||ROB  ||CMU   |
|SRAM ||FSM   ||decode||RNQ->   ||UOQ->    ||RS->ALU ||->WB  ||bcast |
|read ||latch ||latch ||rename ||dispatch||issue  ||state||retire|
|(spec)|      |       | pipe   | +ROB    |        |      |       |
+--+--++--+---++--+---++--+----++---+----++--+----++--+--++--+---+
   |      |       |       |         |        |       |      |
   v      v       v       v         v        v       v      v
 1 cyc  1 cyc   1 cyc   1 cyc    1 cyc    1 cyc   0 cyc  1 cyc

Stage Description

Stage Module Register boundary
IF0 L1I SRAM address latched
IF1 IFU inst, pc, pc_ifu registered
ID IDU uop_t, operands registered
RN RNU rn_pipe_{pr1,pr2,prd,prs,uop} registered
DI ROU UOQ consumed, ROB entry allocated
IS/EX EXU RS/IOQ entries updated
WB ROB ROB state -> ROB_WB
CM CMU Architectural state committed

Cycle Count (Common Case: L1I Hit, ALU, No Stall)

Cycle 0: IF0 -- L1I SRAM speculative read (pc_ifu -> banks)
Cycle 1: IF1 -- IFU latches instruction (L1I hit, tag match)
Cycle 2: ID  -- IDU decodes, latches uop_t
Cycle 3: RN  -- RNU: RNQ enqueue + rename comb logic
Cycle 4: RN  -- RNU: rename pipe register latched (rn_pipe_valid)
Cycle 5: DI  -- ROU: UOQ enqueue + PRF pre-read + bypass
Cycle 6: DI  -- ROU: UOQ dequeue -> dispatch to EXU + ROB alloc
Cycle 7: EX  -- EXU: RS issue + ALU (comb) -> writeback (ROB_EX->ROB_WB)
Cycle 8: CM  -- CMU: commit (ROB_WB, head ready) -> retire + broadcast

Instruction latency: ~9 cycles (first instruction, empty pipeline) Throughput: 1 IPC (single-issue, fully pipelined with all queues flowing)

Load Hit Latency

Loads go through the IOQ path and interact with L1D:

Cycle N+0: DI  -- IOQ enqueue (load dispatched from UOQ)
Cycle N+1: IS  -- IOQ head ready, drives exu_lsu.raddr -> L1D SRAM spec read
Cycle N+2: EX  -- L1D tag match (LD_A), SRAM data ready -> data_hit
Cycle N+3: WB  -- exu_ioq_bcast.valid -> ROB_WB, PRF written, result broadcast

Load-use latency: 3 cycles from IOQ issue to result broadcast (L1D hit)

Store Flow

Stores are split across execution and commit:

Execute: IOQ computes store address + data -> STQ buffer (speculative)
Commit:  ROB head retires store -> SQ enqueue (committed)
Drain:   SQ head -> L1D write-through -> BUS -> AXI4 (background)

Store-to-load forwarding: loads check STQ (speculative) then SQ (committed) via virtual address CAM, youngest-match-wins.

Branch Misprediction Penalty

On misprediction detected at commit (CM stage):

Pipeline Data Flow

 +- frontend (in-order, speculative) ----------------------+
 |      v- [BUS <-load- AXI4]                              |
 |  IFU[l1i,tlb,ifq] =issue=> IDU                          |
 |      ^- bpu[btb(COND/DIRE/INDR/RETU),pht,ghr,rsb]       |
 +---------------------------------------------------------+
          | ifu_idu_if
 +- backend (rename -> dispatch -> execute -> commit) -----+
 |  IDU =issue=> RNU[rnq,freelist,maptable]                |
 |  RNU =rename=> ROU[uoq]                                 |
 |  ROU[uoq] -dispatch-> ROU[rob(rob_entry_t)]             |
 |           =dispatch=> EXU[rs] / EXU[ioq]                |
 |                                                         |
 |  EXU[rs] =writeback=> ROU[rob]   (via exu_rou_if)       |
 |      ^- ALU :alu ops                                    |
 |      +- MUL :mult/div (Booth's / iterative)             |
 |  EXU[ioq] =writeback=> ROU[rob]  (via exu_ioq_bcast_if) |
 |      ^- CSR :Zicsr read                                 |
 |      +- AMO :atomics (LR/SC/AMO*)                       |
 |                                                         |
 |  ROU[rob] =commit=> CMU & LSU[sq] & CSR                 |
 |  CMU =broadcast=> frontend: IFU[pc], BPU[update]        |
 +---------------------------------------------------------+
          | exu_lsu_if, rou_lsu_if, lsu_l1d_if
 +- memory subsystem --------------------------------------+
 |  LSU[stq] (speculative store temp queue)                |
 |  LSU[sq ] (committed store queue -> L1D/BUS)            |
 |  L1D[dtlb,dstlb,dptw] -> BUS -> AXI4 (write-through)    |
 |  TLB (reusable: itlb, dtlb, dstlb)                      |
 |  PTW (reusable: iptw, dptw -- Sv32 walker)              |
 +---------------------------------------------------------+

Module Details

Frontend

IFU – Instruction Fetch Unit (ysyx_ifu.sv)

BPU – Branch Prediction Unit (ysyx_bpu.sv)

L1I – Instruction Cache (ysyx_l1i.sv)

IDU – Instruction Decode Unit (ysyx_idu.sv)

Backend

RNU – Register Naming Unit (ysyx_rnu.sv)

PRF – Physical Register File (ysyx_prf.sv)

ROU – Re-Order Unit (ysyx_rou.sv)

EXU – Execution Unit (ysyx_exu.sv)

CMU – Commit Unit (ysyx_cmu.sv)

CSR – Control & Status Registers (ysyx_csr.sv)

Memory Subsystem

LSU – Load/Store Unit (ysyx_lsu.sv)

L1D – Data Cache (ysyx_l1d.sv)

TLB – Translation Lookaside Buffer (ysyx_tlb.sv)

PTW – Page Table Walker (ysyx_ptw.sv)

BUS – AXI4 Bus Bridge (ysyx_bus.sv)

CLINT – Core Local Interrupt Controller (ysyx_clint.sv)

Interface Summary

Inter-module interfaces (ysyx_if.svh, ysyx_*_if.svh)

Interface Direction Description
ifu_bpu_if IFU -> BPU PC for prediction, NPC + taken back
ifu_l1i_if IFU -> L1I PC fetch request, inst + trap response
ifu_idu_if IFU <-> IDU Forward: inst + PC + predicted NPC; Backward: early resteer (resteer + resteer_pc)
idu_rnu_if IDU -> RNU Decoded uop + operands + arch reg IDs
rnu_rou_if RNU -> ROU Renamed uop + physical reg mappings
rou_exu_if ROU -> EXU Dispatched uop + operands + ROB dest
rou_lsu_if ROU -> LSU Store commit (addr/data/alu)
rou_csr_if ROU -> CSR CSR write + trap/system on commit
rou_cmu_if ROU -> CMU Commit info (PC, branch, fence, flush)
exu_rou_if EXU -> ROU RS writeback (result, branch, CSR, trap)
exu_ioq_bcast_if EXU -> ROU/PRF/LSU IOQ broadcast (ld/st/CSR result)
exu_prf_if ROU -> PRF Operand read (2 ports, valid check)
exu_lsu_if EXU(IOQ) -> LSU Load request (addr/alu/atomic)
exu_csr_if EXU -> CSR CSR read port
exu_l1d_if EXU -> L1D Store MMU + SC reservation check
cmu_bcast_if CMU -> all Retire broadcast (flush, fence, branch)
csr_bcast_if CSR -> all Priv mode, SATP, MMU enable, tvec
lsu_l1d_if LSU -> L1D Load/store data path
l1i_bus_if L1I -> BUS I-cache miss read
l1d_bus_if L1D -> BUS D-cache miss read + write-through

RNU internal interfaces (ysyx_rnu_internal_if.svh)

Interface Description
rnu_fl_if Freelist: alloc_req/alloc_pr (rename), dealloc (commit), flush recovery
rnu_mt_if MapTable: 3 read ports (rs1, rs2, rd_old), spec write (MAP), commit write (RAT)

Configuration Parameters (ysyx_config.svh)

Parameter Default Description
YSYX_XLEN 32 Register width
YSYX_I_EXTENSION 1 RV32I base
YSYX_M_EXTENSION 1 M extension (mul/div)
YSYX_M_FAST 1 Single-cycle mul/div (sim mode)
YSYX_L1I_LINE_LEN 2 L1I line: 2^2 = 4 words
YSYX_L1I_LEN 6 L1I entries: 2^6 = 64
YSYX_L1I_N_WAYS 1 L1I ways (1 = direct-mapped)
YSYX_PHT_SIZE 512 PHT entries
YSYX_BTB_SIZE 128 BTB total entries (64 sets x 2 ways)
YSYX_RSB_SIZE 8 Return stack entries
YSYX_RIQ_SIZE 4 Rename queue (RNQ) entries
YSYX_IIQ_SIZE 4 Dispatch queue (UOQ) entries
YSYX_ROB_SIZE 8 Reorder buffer entries
YSYX_RS_SIZE 4 Reservation station entries
YSYX_IOQ_SIZE 4 In-order queue entries
YSYX_SQ_SIZE 8 Store queue entries
YSYX_L1D_LINE_LEN 1 L1D line: 2^1 = 2 words per line
YSYX_L1D_LEN 5 L1D sets: 2^5 = 32
YSYX_L1D_N_WAYS 2 L1D ways (2-way set-associative)
YSYX_DUAL_COMMIT defined Dual commit: retire up to 2 ROB entries/cycle
YSYX_ISSUE_WIDTH 1 Instructions dispatched per cycle
YSYX_REG_SIZE 32 Architectural registers
YSYX_PHY_SIZE 64 Physical registers

Key Types (ysyx_pkg.sv)

Type Description
uop_t Micro-op: decoded instruction fields (alu, branch, mem, CSR, trap, pc, inst, imm)
prd_t Physical register descriptor: op1/op2 values + pr1/pr2/prd/prs mappings
rob_state_t ROB entry state enum: ROB_CM (committed), ROB_WB (written-back), ROB_EX (executing)
rob_entry_t Full ROB entry: phys regs, arch rd, state, branch/jump, memory, atomics, CSR, trap, fence, difftest_skip, inst/PC
addr_cacheable() Function: returns true if address is in a cacheable region (mrom, flash, psram, sdram)
addr_valid() Function: returns true if address is in any valid memory-mapped region

Mermaid Diagram

mermaid 1 diagram 2 of the uarch:

flowchart TD
 subgraph FE["Frontend (in-order)"]
        IFU["IFU (3-state FSM)"]
        L1I["L1I (direct-mapped, ITLB, PTW)"]
        IFQ["IFQ (2-entry)"]
        IDU["IDU (RVC + decoder)"]
        BPU["BPU (PHT/BTB/GHR/RSB)"]
  end
 subgraph BE["Backend (rename -> dispatch -> execute -> commit)"]
        subgraph RNU_TOP["RNU (pure rename)"]
            RNQ["RNQ (RIQ_SIZE)"]
            FL["Freelist (rnu_fl_if)"]
            MT["MapTable (rnu_mt_if)"]
        end
        PRF["PRF (2R/2W, valid+transient)"]
        ROU["ROU (UOQ + ROB: rob_entry_t[])"]
        EXU["EXU"]
        RS["RS (RS_SIZE)"]
        IOQ["IOQ (IOQ_SIZE, in-order)"]
        ALU["ALU (RV32I)"]
        MUL["MUL (RV32M, Booth)"]
        CMU["CMU (retire broadcast)"]
        CSR["CSR (M/S-mode, Sv32)"]
  end
 subgraph MEM["Memory Subsystem"]
        LSU["LSU (STQ + SQ, forwarding)"]
        L1D["L1D (2-way, banked SRAM, RMW)"]
        TLB["TLB (reusable: ITLB, DTLB, DSTLB)"]
        PTW["PTW (reusable: IPTW, DPTW)"]
        BUS["BUS (AXI4 bridge)"]
        CLINT["CLINT (mtime, timer IRQ)"]
  end
    BPU -->|"npc,taken"| IFU
    IFU -->|"ifu_l1i_if"| L1I
    L1I --- IFQ
    L1I -->|"inst"| IFU
    IFU -->|"ifu_idu_if"| IDU
    IDU -->|"idu_rnu_if"| RNU_TOP
    RNQ --> FL & MT
    RNU_TOP -->|"rnu_rou_if"| ROU
    ROU -->|"exu_prf_if"| PRF
    ROU -->|"rou_exu_if"| EXU
    EXU --> RS & IOQ
    RS --> ALU & MUL
    IOQ -->|"exu_lsu_if"| LSU
    IOQ -->|"exu_csr_if"| CSR
    RS -->|"exu_rou_if"| ROU
    IOQ -->|"exu_ioq_bcast_if"| ROU
    EXU -->|"write"| PRF
    ROU -->|"rou_cmu_if"| CMU
    ROU -->|"rou_csr_if"| CSR
    ROU -->|"rou_lsu_if"| LSU
    CMU -->|"cmu_bcast_if"| IFU & BPU
    LSU -->|"lsu_l1d_if"| L1D
    EXU -->|"exu_l1d_if"| L1D
    L1I --- TLB & PTW
    L1D --- TLB & PTW
    L1I -->|"l1i_bus_if"| BUS
    L1D -->|"l1d_bus_if"| BUS
    BUS <-->|"AXI4"| AXI["AXI4 Master"]
    CLINT --- BUS
    CSR -->|"csr_bcast_if"| L1I & L1D & EXU
  1. https://mermaid.js.org/ 

  2. https://www.mermaidchart.com/play