Due: Tuesday Feb 25th.
The purpose of this assignment is to probe a real graphics subsystem to learn as much as you can about its performance, design, and flaws. Your benchmarking software should be sufficiently flexible to automatically report the characteristics of different graphics cards (i.e., you should be able to install and run it on a new machine with a bare minimum of effort). Of course, if you are using vendor-specific capabilities, this may not always be possible.
Many thanks to Ian Buck for writing the skeleton code for GfxBench. You may, of course, feel free to discard the GfxBench skeleton and write your own. This assignment is adapted from Pat Hanrahan and Kurt Akeley's course Real Time Graphics Architectures and Greg Humphrey's course Big Data in Computer Graphics.
This assignment is fairly open-ended, and somewhat loosely specified in places. This is deliberate. The techniques necessary to get graphics hardware to run fast are sometimes obscure, and certainly not well documented. You will have to do some digging and experimentation to figure out how to make things run fast. You may want to explore the web for hints about how to do this, particular for part III. You may find that you're losing a factor of two somewhere just because some magic OpenGL state setting is wrong.
Feel free to share hints and URL's about making hardware run fast with other students, but keep your actual results and code to yourselves. For example, telling a friend "I realized that if I disabled the depth test, it made a huge difference" is encouraged. Telling a friend "The texture download bandwidth of my GeForce 4 seems to be one petabyte per fortnight" is doing his work for him. When in doubt, ask me.
Modern graphics hardware can be generalized into two parallel components: geometry and rasterization. The performance of the graphics system for any given scene is determined by the slower of these two components. For small triangles, the rasterization work per triangle is small, so the system is limited by the rate at which vertices can be processed. For large triangles, or ones with complex shading, the fragment operations may dominate the rendering pipeline.
Provided for this assignment is a sample benchmarking application called GfxBench (see below). It measures the fill rate and triangle rate using OpenGL. Modify the GfxBench application to examine the crossover point between geometry limited rendering and rasterization limited rendering. Graph the triangle rate as a function of triangle size for regular smooth shaded triangles. What is the crossover point?
Next, modify the program to determine the triangle rate, fill rate, and crossover point for the following: textured triangles, lit triangles, and textured, lit triangles. Graph and explain your results in a one page (not including graphs or code) write up. Compare your results for the different triangle types and explain why they may exist. Please include your source code in your write-up. Be sure to discuss any interesting details you might find.
Rasterization
What is the effect of triangle shape on rasterization
performance? Is there a difference in long, thin vertical triangles verses long,
thin horizontal triangles? Modify the GfxBench application to test a variety of
triangle shapes. Also graph fill rate as a function of triangle size.
Present your results and discuss what this tells you about how the rasterizer
works.
Texturing
Explore the texture cache behavior. Modern graphics
hardware maintains an on-chip texture cache as well as using on-board video
memory for local texture storage. Modify the GfxBench program to determine the
size of the on-chip texture cache and on-board texture memory usage. How
does the performance change as a function of texture angle? What (if
anything) can you determine about the cache's replacement policy? If you
can figure out what it is, does it make sense? Graph your results and
explain the texture cache behavior and how you were able to measure it.
Flexibility/Programmability
Explore the performance impact of the
programmable features of the newer graphics cards, especially fragment
processing. Are certain features slower than others? Why is this?
Investigate the overall performance impact of multitexturing, dependent
textures, texture locality (think bump-mapped environment mapping with varying
levels of bumpiness), etc. See just how slow you can make the card
run! Discuss the tradeoff between functionality and performance.
Vertex Engine
Modern graphics hardware includes a vertex cache for
triangles that share vertices. This cache maintains transformed vertices that
can improve geometry rates. Modify the GfxBench application to examine the
vertex cache size and performance. Graph and explain your results. Discuss
briefly the benefits and tradeoffs of a vertex cache.
Graphics Interface
Modify the GfxBench application to explore the
front end of the graphics pipeline. Modern graphics hardware allows for placing
vertex data in AGP memory or on-board video memory for increased geometry
performance. Use the NVIDIA "VertexArrayRange" extension to examine the
performance of using AGP and video memory for vertex data compared to regular
malloc'ed memory. Is one better than the other? Why is this? (See the NVIDIA
Opengl Extension spec for more details.) How close can you get to the
advertised performance of your graphics card?
Other
If there is some other aspect of graphics hardware performance that really
intrigues you and you think you can probe it automatically, send me e-mail.
This assignment is due Tuesday, February 25, at the start of class. If you need an extension please send email in advance.
You are allowed to work in groups of two. At least one person should be familiar with OpenGL programming. If you have not programmed in OpenGL, find someone else in the class who has to pair up with. If less than 50% of the class has had OpenGL experience, I may allow groups of three people, but I will expect a more thorough analysis from three-person groups.
Submitting Your Results: Write up a web page which includes a complete report of your results, including source any source code wrote and graphs of your data. Email the URL to billmark@cs.utexas.edu before class on Tuesday, February 25th.
Sample Code: The source code and compiled excutable for the GfxBench application can be found here:
This has been tested under Windows using Microsoft Visual Studio 6.0, and Linux using GNUMake. I make no claims (nor does Ian) that GfxBench is itself optimal (i.e., it may be possible to achieve a slightly higher triangle rate than GfxBench does, even just using immediate mode calls). If you wish to improve GfxBench or start from scratch, feel free. Document what you did differently in your writeup.
Grading: A high grade will be awarded if you demonstrate a good understanding of how graphics hardware could work. Coming up with the correct value for a particular performance metric is less important than how you analyize your results. You are not expected to know all of the details regarding the system you benchmark. Rather your grade will be determined by the tests you design and your analysis of the results. Groups will be assigned the same grade.