Lecture 13: Branch prediction, static & dynamic multiple issue ============================================================== Branch prediction ================= * Simplest strategy: - predict taken - invalidate if wrong - this has to happen to some degree anyway if nothing else is done * Static branch prediction * Look at simple code: for (i=0; i<10; i++) { foo; } loop: subi R2, R3, #10 bgez done foo addi R3, R3, #1 j loop done: - hint bit in instruction - or, backward branches are taken - predict accordingly * Dynamic branch prediction * Look at simple code: if (flag) { } else { } In assembly this looks like: BEQ R1, R2, else do_something; j skip else: do_something2; skip: ... What can tell us more about how this branch is taken? What if flag is set at the beginning of the program? a) Use history of this branch - last time -- leads to double-mispredict - last two times -- leads to better behavior - table indexed by low bits of program counter * "Branch History Buffer" b) Use history of this branch, *and* previous branch! - table index by low bits of program counter, plus result of previous branch * "Branch Target Buffer" * What about destination address? We need it immediately. But takes two cycles: PC -> Ifetch/Decode -> Add PC to Offet Instead: Use low bits of PC as index into table. Fetch destination from table. This is just: PC -> Table one less step; no need to wait for decode Multiple Issue ============== [Superscalar, VLIW] Use slides. Out of order execution ====================== Show slide with 50 cycle load delay. Idea: don't wait on that instruction; instead look ahead for an instruction that can be executed. * Requires that all dependencies be tracked * "later" instructions may finish *before* "earlier" instructions. * Bookkeeping is very complicated * All high-performance "CPU" processors do this.