Assignment 1: Benchmarking Graphics Hardware

Due: Tuesday Feb 25th.

The purpose of this assignment is to probe a real graphics subsystem to learn as much as you can about its performance, design, and flaws.  Your benchmarking software should be sufficiently flexible to automatically report the characteristics of different graphics cards (i.e., you should be able to install and run it on a new machine with a bare minimum of effort).  Of course, if you are using vendor-specific capabilities, this may not always be possible.

Many thanks to Ian Buck for writing the skeleton code for GfxBench.  You may, of course, feel free to discard the GfxBench skeleton and write your own.  This assignment is adapted from Pat Hanrahan and Kurt Akeley's course Real Time Graphics Architectures and Greg Humphrey's course Big Data in Computer Graphics.

This assignment is fairly open-ended, and somewhat loosely specified in places.  This is deliberate.  The techniques necessary to get graphics hardware to run fast are sometimes obscure, and certainly not well documented.  You will have to do some digging and experimentation to figure out how to make things run fast.  You may want to explore the web for hints about how to do this, particular for part III.  You may find that you're losing a factor of two somewhere just because some magic OpenGL state setting is wrong. 

Feel free to share hints and URL's about making hardware run fast with other students, but keep your actual results and code to yourselves.  For example, telling a friend "I realized that if I disabled the depth test, it made a huge difference" is encouraged.  Telling a friend "The texture download bandwidth of my GeForce 4 seems to be one petabyte per fortnight" is doing his work for him.  When in doubt, ask me.

Part I: Explore the crossover point between geometry and rasterization.

Modern graphics hardware can be generalized into two parallel components: geometry and rasterization. The performance of the graphics system for any given scene is determined by the slower of these two components. For small triangles, the rasterization work per triangle is small, so the system is limited by the rate at which vertices can be processed. For large triangles, or ones with complex shading, the fragment operations may dominate the rendering pipeline.

Provided for this assignment is a sample benchmarking application called GfxBench (see below). It measures the fill rate and triangle rate using OpenGL. Modify the GfxBench application to examine the crossover point between geometry limited rendering and rasterization limited rendering. Graph the triangle rate as a function of triangle size for regular smooth shaded triangles. What is the crossover point?

Next, modify the program to determine the triangle rate, fill rate, and crossover point for the following: textured triangles, lit triangles, and textured, lit triangles. Graph and explain your results in a one page (not including graphs or code) write up. Compare your results for the different triangle types and explain why they may exist. Please include your source code in your write-up. Be sure to discuss any interesting details you might find.

Part II: Detailed examination of specific parts of the graphics pipeline

Choose two of the following aspects of graphics hardware to explore. Present your results in a second one page (not including graphs or code) writeup. Please include any source code used to generate your results.

Rasterization
What is the effect of triangle shape on rasterization performance? Is there a difference in long, thin vertical triangles verses long, thin horizontal triangles? Modify the GfxBench application to test a variety of triangle shapes. Also graph fill rate as a function of triangle size.  Present your results and discuss what this tells you about how the rasterizer works.

Texturing
Explore the texture cache behavior. Modern graphics hardware maintains an on-chip texture cache as well as using on-board video memory for local texture storage. Modify the GfxBench program to determine the size of the on-chip texture cache and on-board texture memory usage.  How does the performance change as a function of texture angle?  What (if anything) can you determine about the cache's replacement policy?  If you can figure out what it is, does it make sense?  Graph your results and explain the texture cache behavior and how you were able to measure it.

Flexibility/Programmability
Explore the performance impact of the programmable features of the newer graphics cards, especially fragment processing.  Are certain features slower than others?  Why is this? Investigate the overall performance impact of multitexturing, dependent textures, texture locality (think bump-mapped environment mapping with varying levels of bumpiness), etc.  See just how slow you can make the card run!  Discuss the tradeoff between functionality and performance.

Vertex Engine
Modern graphics hardware includes a vertex cache for triangles that share vertices. This cache maintains transformed vertices that can improve geometry rates. Modify the GfxBench application to examine the vertex cache size and performance. Graph and explain your results. Discuss briefly the benefits and tradeoffs of a vertex cache.

Graphics Interface
Modify the GfxBench application to explore the front end of the graphics pipeline. Modern graphics hardware allows for placing vertex data in AGP memory or on-board video memory for increased geometry performance. Use the NVIDIA "VertexArrayRange" extension to examine the performance of using AGP and video memory for vertex data compared to regular malloc'ed memory. Is one better than the other? Why is this? (See the NVIDIA Opengl Extension spec for more details.)  How close can you get to the advertised performance of your graphics card?

Other
If there is some other aspect of graphics hardware performance that really intrigues you and you think you can probe it automatically, send me e-mail.

Logistics

This assignment is due Tuesday, February 25, at the start of class.  If you need an extension please send email in advance.

You are allowed to work in groups of two.  At least one person should be familiar with OpenGL programming.  If you have not programmed in OpenGL, find someone else in the class who has to pair up with. If less than 50% of the class has had OpenGL experience, I may allow groups of three people, but I will expect a more thorough analysis from three-person groups.

Submitting Your Results:  Write up a web page which includes a complete report of your results, including source any source code wrote and graphs of your data.  Email the URL to billmark@cs.utexas.edu before class on Tuesday, February 25th.

Sample Code:  The source code and compiled excutable for the GfxBench application can be found here:

GfxBench.zip (Windows)

GfxBench.tar.gz (Unix)

This has been tested under Windows using Microsoft Visual Studio 6.0, and Linux using GNUMake.  I make no claims (nor does Ian) that GfxBench is itself optimal (i.e., it may be possible to achieve a slightly higher triangle rate than GfxBench does, even just using immediate mode calls).  If you wish to improve GfxBench or start from scratch, feel free.  Document what you did differently in your writeup.

Grading: A high grade will be awarded if you demonstrate a good understanding of how graphics hardware could work. Coming up with the correct value for a particular performance metric is less important than how you analyize your results.  You are not expected to know all of the details regarding the system you benchmark.  Rather your grade will be determined by the tests you design and your analysis of the results.  Groups will be assigned the same grade.