Lecture 13: Branch prediction, static & dynamic multiple issue, OOO =================================================================== Branch prediction ================= * Simplest strategy: - predict taken - invalidate if wrong - this has to happen to some degree anyway if nothing else is done * Static branch prediction * Look at simple code: for (i=0; i<10; i++) { foo; } loop: subi R2, R3, #10 bgez done foo addi R3, R3, #1 j loop done: - hint bit in instruction - or, backward branches are taken - predict accordingly * Dynamic branch prediction * Look at simple code: if (flag) { } else { } In assembly this looks like: BEQ R1, R2, else do_something; j skip else: do_something2; skip: ... What can tell us more about how this branch is taken? What if flag is set at the beginning of the program? a) Use history of this branch - last time -- leads to double-mispredict - last two times -- leads to better behavior - table indexed by low bits of program counter * "Branch History Buffer" b) Use history of this branch, *and* previous branch! - table index by low bits of program counter, plus result of previous branch * "Branch Target Buffer" * What about destination address? We need it immediately. But takes two cycles: PC -> Ifetch/Decode -> Add PC to Offet Instead: Use low bits of PC as index into table. Fetch destination from table. This is just: PC -> Table one less step; no need to wait for decode Multiple Issue ============== [Superscalar, VLIW] Use slides. OOO, Register renaming ====================== Example of out-of-order execution and register renaming ========== This lecture was given on the chalkboard, so you had to "be there" to get the full benefit. First, I described the difference between a superscalar processor (multiple execution units) and an out-of-order processor. Out of order execution is particularly important to allow the processor to do useful work during a cache miss by a load instruction. Then, we found RAW, WAW, and WAR dependencies in the following code: L.D F6, 34(R2) L.D F2, 45(R3) MULT.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Next, we renamed all of the architectural registers (Fn, Rn) in this code to physical registers (Pn). We did this by moving forward through the code and updating the table that maps architectural registers to physical registers. I also explained the general organization of an out-of-order processor, with a better version of the following figure: IN ORDER Instruction Fetch PROCESSING | (architectural registers) | ...........................\|/................... . Reservation Stations OUT-OF-ORDER . Execution units (ALU's) PROCESSING . Reorder Buffer (physical registers) ................................................. Commit IN ORDER | PROCESSING \ / (architectural registers) Finally, I noted that I was glossing over some details of real out-of-order processors, but that I primarily wanted to make sure you understood the following concepts: . how dependencies between instructions restrict which instructions can be executed out-of-order . how register renaming is used to eliminate false (WAW/WAR) dependencies . the high-level organization of an out-of-order processor