Lecture 13: Branch prediction, static & dynamic multiple issue
==============================================================

Branch prediction
=================

* Simplest strategy:
  - predict taken
  - invalidate if wrong
  - this has to happen to some degree anyway if nothing else is done

* Static branch prediction
  * Look at simple code:


     for (i=0; i<10; i++) {
       foo;
     }

     loop:  subi R2, R3, #10
	    bgez done
	    foo
	    addi R3, R3, #1
	    j loop
     done: 


  - hint bit in instruction
  - or, backward branches are taken
  - predict accordingly

* Dynamic branch prediction
   * Look at simple code:

      if (flag) {

      } else {

      }

      In assembly this looks like:

            BEQ R1, R2, else
	    do_something;
	    j skip
      else: do_something2;
      skip: ...

      What can tell us more about how this branch is taken?
      What if flag is set at the beginning of the program?

   a) Use history of this branch
       - last time -- leads to double-mispredict
       - last two times -- leads to better behavior
       - table indexed by low bits of program counter
       * "Branch History Buffer"

   b) Use history of this branch, *and* previous branch!
       - table index by low bits of program counter, plus result
         of previous branch
       * "Branch Target Buffer"

* What about destination address?
  We need it immediately.
  But takes two cycles:  PC -> Ifetch/Decode -> Add PC to Offet
  Instead:
    Use low bits of PC as index into table.
    Fetch destination from table.
    This is just:
       PC -> Table
       one less step; no need to wait for decode

Multiple Issue
==============
[Superscalar, VLIW]

Use slides.

Out of order execution
======================
Show slide with 50 cycle load delay.
Idea: don't wait on that instruction; instead look ahead
for an instruction that can be executed.

* Requires that all dependencies be tracked
* "later" instructions may finish *before* "earlier" instructions.
* Bookkeeping is very complicated
* All high-performance "CPU" processors do this.