Stages:

- Fetch
- RegRead/BranchPred
- ALU/BranchVerify
- Mem
- WB

pregs separate stages. preg "Fetch" is after Fetch and before RegRead, etc.

Each preg has a "bubble" input, which feeds NOP (opcode 13) into IR
and 0s into the data fields. Each preg also has a "stall" input, which
disables the register load (or, equivalently, feeds the current
outputs into the load inputs). Otherwise, the new values from the
previous stage are loaded and fed to the next stage on the rising edge
of the clock.

So, to discard the partial results in the pipeline from the beginning
to a certain stage, assert the "bubble" signal for each preg before
that stage. To insert a true bubble, stall all stages from the
beginning forward and assert "bubble" for the preg immediately before
the stage that the bubble will follow.

PC lies before Fetch in the flow.

---

- Fetch ( PC -> IR )

IR <= IMem[PC]
PredInsn <= IMem[PC]
if(!Fetch_Stall)
        PC <= PC + 1

- RegRead ( IR -> IR, A, B, C, K, Pred )

IR <= IR

A <= RF[IR[7:4]]
B <= RF[IR[3:0]]
K <= ZeroExtend(IR[7:0])
if(IsStore(IR))
        C <= RF[IR[11:8]]

Pred <= PredOut
if(IsBranch(IR))
        Fetch_Bubble <= 1   // bubble always follows a branch
        if(Pred)
                PC <= PC + SignExtend(IR[7:0])

- ALU ( IR, A, B, C, K, Pred -> IR, Q, C )

IR <= IR
Pred <= Pred
Q <= ALU(A, B, K, IR)
C <= C
if(IsBranch(IR))
        Branch <= IsBranchNeg(IR) && ALU.N || IsBranchZero(IR) && ALU.Z
        if(Branch != Pred)
                Fetch_Bubble <= 1    // preg after fetch
                RegRead_Bubble <= 1  // preg after regread
                if(Branch)
                        PC <= ALU.Q // ALU.B = K, ALU.A == PC
                else
                        PC <= PC_orig + 1
        PredInsn2 <= IR
        PredActual <= Branch

- Mem ( IR, Q, C -> IR, Q )

IR <= IR
if(IsLoad(IR))
        Q <= DMem[Q]
else if(IsStore(IR))
        DMem[Q] <= C
else
        Q <= Q

- WB

if(IsRegType(IR) || IsLoad(IR))
        RF[IR[11:8]] <= Q
