Tag Archives: Optimizations

Basic Culling Techniques

The best optimization when rendering is to not render anything unnecessary. And that is what culling is about, to find out what can be skipped when rendering because it cannot be visible anyway.  Below are the basic culling techniques which most renderers’ implements. The image is from a course slide.

Culling Techniques

Back Face Culling

Faces that faces away from the camera can not be visible on the screen so they don’t need to be drawn. This is so often used that it’s implemented by the hardware. It roughly cuts the amount of faces drawn in half. Just remember to turn it on!

View frustum Culling

The faces that is outside the view frustum can not be visible on the screen (we don’t bother about reflections now) so they can be culled. This check is done by checking if the geometries bounding volume is outside the view frustum volume or not. So the check will not be done on every face, as that would cost to much. Sometimes, the view frustum culling can cost more than what is gain ( for example when doing instancing). One way to speed up view frustum culling is to use a suitable spatial structure for the scene (octree, BSP or so).

All kind of bounding volume test that you can imagine can be found on this page:
http://www.realtimerendering.com/intersections.html

Info about frustum culling:
http://www.flipcode.com/archives/Frustum_Culling.shtml

Portal Culling

A technique that divides the scene into cells with portals between. When rendering, the camera will be in one of the rooms and that room will be rendered normally. But for each portal that is visible in the room a view frustum is set up for the size of the portal and then the room behind it is rendered. This will work recursively. The result will be that a lot of geometry can be culled by view frustum culling when rendering the other rooms. A very useful technique for indoor scenes.

Detail Culling

When a geometry is so far away that it’s not visible then there is no need to draw it at all so it can safely be culled. A more advanced scheme of detail culling that decreases the amount of details with the distance is LOD (level of detail).

Occlusion Culling

The hardest culling technique to implement. Geometry that is occluded by other geometry does not need to be rendered. One solution is to use the Z-buffer and sort the geometry in a front to back order. But this does not always work and all pixels needs to be checked against the Z-buffer so it will be costly for big scenes. Better occlusion culling techniques culls the geometry before it’s even sent to the GPU.

A good link to a couple of occlusion culling techniques
http://www.gamasutra.com/features/19991109/moller_haines_01.htm

Instancing

Instancing is a new way to offload the CPU from some work when rendering many copies of the same geometry.  It does it by reducing the overhead of drawing multiple copies of the same vertex buffer.

In OpenGL it’s only fast to use instancing when the instanced mesh consists of very few triangles.

Nvidias instancing demo, here with 136499 meshes rendered at once with 24 triangles per mesh. It runs at 20 fps stable on a GeForce 8800 GTS. (left image is all objects viewed from far away, right is zoomed in)

Instancing

A image from Microsofts DirectX10 instancing demo

Instancing

Some test made that shows when to use instancing and when not
http://www.ozone3d.net/blogs/lab/?p=87

HLSL instancing (therefore DirectX)
http://developer.download.nvidia.com/SDK/9.5/Samples/DEMOS/Direct3D9/
src/HLSL_Instancing/docs/HLSL_Instancing.pdf

Nvidias DirectX10 implementation of instancing
http://developer.download.nvidia.com/SDK/10.5/direct3d/Source/InstancingTests/
doc/InstancingTests.pdf

Microsofts DirectX9 instancing sample
http://msdn.microsoft.com/en-us/library/bb174602(VS.85).aspx

Microsofts DirectX10 instancing sample
http://msdn.microsoft.com/en-us/library/bb205317(VS.85).aspx

An OpenGL implementation of a pseduo-instancing (recommended for old hardware).
http://http.download.nvidia.com/developer/SDK/Individual_Samples/
DEMOS/OpenGL/src/glsl_pseudo_instancing/docs/glsl_pseudo_instancing.pdf

OpenGL instancing:
http://www.opengl.org/registry/specs/EXT/draw_instanced.txt

Octree

Octrees are one of many ways to divide the space into smaller volumes structured in a tree. This is useful for culling objects while rendering or for finding collisions between objects. First cover the entire space that is of important into one big cuboid. Then the dividing algorithm for the tree goes as follows:

- First select the maximum number of objects in each cuboid. This determines how deep the tree will go.

- For each cuboid that contains more objects than the maximum allowed, split it into eight (oct = eight) new cuboids with the same volume and dimensions. Repeat this step until all cuboids fulfill the requirement.

The procedure looks like the following:

Octree creation process

More information about octrees.
http://www.gamasutra.com/features/19970801/octree.htm

Source of the image:
http://en.wikipedia.org/wiki/Octree

A little description of octrees.
http://www.flipcode.com/archives/Introduction_To_Octrees.shtml

Alpha Testing

When drawing for example lots of vegetation (with textures containing an alpha channel) you need to minimize overdraw and cost for each draw call. If you use blending for archiving the desired transparent effect on the leaves then you will need to sort all the triangles (back-to-front-order) to draw, (and sometimes this needs splitting of triangles) to get the correct look. Also when blending you might render the same pixel multiple times because you want the result to blend together. All this drawing and sorting takes a lot of time and a better way is to do alpha testing. In alpha testing (or alpha killing) you select an alpha threshold and all pixels with an alpha above (if you use GREATER as test function) this threshold will be drawn. Because no blending occurs, you don’t need to sort the triangles anymore (but preferably you will sort them front-to-back-order for optimization) . The drawback with alpha killing is that you will get sharp edges around the pixels drawn.
Alpha killing does not work with shadow volumes.

Alpha testing in OpenGL
http://opengl.org/documentation/specs/version1.1/glspec1.1/node96.html

Alpha testing in DirectX9 (in DirectX10 you must do your own implementation in a shader)
http://msdn.microsoft.com/en-us/library/bb172254.aspx