Multiprocessors I ================= Lecture 24, 4/21/2005 1) Went through slides: - mostly examples of multiprocessor machines of different sizes and organizations. . multi-chip multiprocessors . single-chip multiprocessors *** We're going to see lots of these in the future, now that we can fit many of them on a die and ILP parallelism is reaching its limits. 2) What are limits to single-processor performance? (for now, define single-processor as one program counter) begs question: How do we improve performance of uniprocessors? - pipelining but eventually branch prediction and latch overheads kill you - multi-issue machines but eventually you run out of parallel work to do within a small window. problems due to branch prediction and cache misses. More fundamentally: limits to fine-grained instruction level parallelism. 3) Reasons for parallel processing have changed. Traditionally: - you want more performance than you can get from a single-chip processor. e.g. cluster of PCs Now: - you want maximum performance from *one* chip - parallelism is the most effective way to do this 4) Some kinds of multiprocessing? - Diagram of SIMD machine (shared PC and instruction decode) - Diagram of MIMD machine (PC and decode for each processor) - Contrast with ILP parallelism (one PC, but each instruction only executed on one piece of data, unlike SIMD machine) 5) Examples of uses: - ATM transaction processing system - Web server - Graphics processors - OS + application#1 + application#2 6) Writing parallel programs can be hard... - static-page web serving -- easy - parallelizing the final project assignment (cache sim) -- hard 7) Granularity of parallelism: Granularity Example ----------- ------- bit 32-bit adder instruction (ILP) superscalar processor or out-of-order processor procedure ... task MSWord + Web browser running on two-processor machine 8) Limits to parallelism: - Amdahl's Law... What percentage of program is parallelizable? What if I perfectly parallelized that? 9) Communication models Asked question: how do two program communicate? Message passing: - e.g. network sockets - e.g. special API like MPI Shared memory: - Showed how page tables could implement this on operating system, for allowing two processes to share memory - Briefly mentioned issues with cache consistency - non-uniform memory access vs. uniform memory access