Category Archives: Optimizations

Description

Texture Atlas

Texture atlas [1][2] is a technique to group smaller textures into a larger texture. This decreases the number of state switches [3] a renderer needs to do and therefore often increases performance. Texture atlases have been used for a long time in the video game industry for sprite animations. When using texture atlases, the uv-coordinates of the models have to be changed so the original 0..1 map to the textures tile in the atlas. Grouping of textures can be done manually by texture artists or with tools. The texture coordinate system can be changed to map the new texture in a tool, or in the shader at run-time.

An example texture atlas
Image from article [2]

There are some limitations with using texture atlases compared to normal textures. First of all, all texture coordinates must initially be within 0..1 range. So for example, no “free” tiling can be used. The other problem is bleeding between tiles in the atlas when doing filtering, for example when using mipmaps.

Some additional information from Ivan-Assen Ivanov, author of article [2].

” – separate textures hurts not only batching (in facts, it hurts batching less than years ago), but also memory – as there is a certain per-texture overhead. This is especially painful on consoles – on PCs, the overhead is still there, I guess, but the driver hides it from you. The exact numbers are under NDA, of course, but on an old, unreleased project, we saved about 9 MB by atlas-sing a category of textures we didn’t atlas before.

- vertex interpolators are expensive! make sure you measure the remapping from 0..1 to the actual UVs in the atlas both in the vertex and in the pixel shader. Sounds counterintuitive, but on modern GPUs and with dense geometry, pixel shader is actually faster.” 

[1] “Improve Batching Using Texture Atlases” http://http.download.nvidia.com/developer/NVTextureSuite/Atlas_Tools/Texture_Atlas_Whitepaper.pdf

[2] “Practical Texture Atlases” (borrowed image from this page)
http://www.gamasutra.com/features/20060126/ivanov_01.shtml

[3] “Batch, Batch, Batch: What Does It Really Mean?”
http://developer.nvidia.com/docs/io/8230/batchbatchbatch.pdf

Grid Traversal

Probably the best and most famous grid traversal algorithm uses a simple loop that for each iteration takes one more step in the grid. It works in both 2D and 3D.

Java code:

   public boolean isFreePath(float startX, float startZ, float endX, float endZ)
   {
      // calculate the direction of the ray (linear algebra)
      dirX = (endX - startX);
      dirY = (endZ - startZ);
      float length = (float) Math.sqrt(dirX * dirX + dirY * dirY);
      dirX /= length; // normalize the direction vector
      dirY /= length;
 
      tDeltaX = MAP_RESOLUTION / Math.abs(dirX); // how far we must move in the ray direction before we encounter a new voxel in x-direction
      tDeltaY = MAP_RESOLUTION / Math.abs(dirY); // same but y-direction
 
      // start voxel coordinates
      x = (int) (worldCoordToMapCoord(startX));  // use your transformer function here
      y = (int) (worldCoordToMapCoord(startZ));
 
      // end voxel coordinates
      endX1 = (int) (worldCoordToMapCoord(endX));
      endY1 = (int) (worldCoordToMapCoord(endZ));
 
      // decide which direction to start walking in
      stepX = (int) Math.signum(dirX);
      stepY = (int) Math.signum(dirY);
 
      float tMaxX, tMaxY;
      // calculate distance to first intersection in the voxel we start from
      if(dirX < 0)
      {
         tMaxX = (mapToWorld(x)-startX) / dirX;
      }else
      {
         tMaxX = (mapToWorld(x+1)-startX) / dirX;
      }
 
      if(dirY < 0)
      {
         tMaxY = (mapToWorld(y)-startY) / dirY;
      }else
      {
         tMaxY = (mapToWorld(y+1)-startY) / dirY;
      }
 
      // check if first is occupied
      if (isMapPositionOccupied(x, y)) // use your function here
         return false;
 
      boolean reachedX = false, reachedY = false;
 
      while (true)
      {
         if (tMaxX < tMaxY)
         {
            tMaxX += tDeltaX;
            x += stepX;
         } else
         {
            tMaxY += tDeltaY;
            y += stepY;
         }
 
         if (stepX > 0.0f)
         {
            if (x >= endX1)
            {
               reachedX = true;
            }
         } else if (x <= endX1)
         {
            reachedX = true;
         }
 
         if (stepY > 0.0f)
         {
            if (y >= endY1)
            {
               reachedY = true;
            }
         } else if (y <= endY1)
         {
            reachedY = true;
         }
 
         if (isMapPositionOccupied(x, y))
         {
            return false;
         }
 
         if (reachedX && reachedY)
         {
            break;
         }
      }
 
      return true;
   }

Here’s an applet showing it in action.
The white tiles are free, the gray tiles are blocking. The ray starts at the green dot and ends at the red dot. The gray spheres are the visited grid sections. Use leftclick to position the green dot and the right button to position the red dot.
 


Hi There


Link to paper:
http://www.flipcode.com/archives/A%20faster%20voxel%20traversal%20algorithm%20for%20ray%20tracing.pdf

Some discussion on GameDev:
http://www.gamedev.net/community/forums/topic.asp?topic_id=385303

Implementations:
http://www.clockworkcoders.com/oglsl/rt/gpurt3.htm
http://www.devmaster.net/articles/raytracing_series/part4.php

Full source of the applet:

import java.awt.*;
import java.awt.event.MouseEvent;
import java.awt.event.MouseListener;
import java.awt.event.MouseMotionListener;
import java.applet.*;
 
public class GTApplet extends Applet implements MouseListener,
		MouseMotionListener
{
	enum mapNotations
	{
		FREE, OCCUPIED, BOOKED, RESERVED
	};
 
	private static final int GRID_PAINT_SIZE = 50;
 
	private static final int MAP_RESOLUTION = GRID_PAINT_SIZE;
 
	private static final int MAP_SIZE = 4 * 2;
 
	private final mapNotations[][] map = new mapNotations[MAP_SIZE][MAP_SIZE];
 
	private final int[][] markedMap = new int[MAP_SIZE][MAP_SIZE];
 
	// Specify variables that will be needed everywhere, anytime here
	// The font variable
	Font bigFont, smallFont, smallestFont;
 
	Color bgColor;
	Color gridLineColor;
	Color gridOccupiedColor;
	Color startColor;
	Color endColor;
	Color helperLineColor;
	Color markerColor;
	Color textColor;
 
	int startX = 50, startY = 50, endX = 50, endY = 50;
 
	int markerDiameter = 6;
	int mapMarkerDiameter = 26;
 
	boolean isFreePath = false;
 
	public void init()
	{
		// init the map
		for (int x = 0; x < MAP_SIZE; x++)
		{
			for (int y = 0; y < MAP_SIZE; y++) 			{ 				if (Math.random() > 0.2f)
				{
					map[x][y] = mapNotations.FREE;
				} else
				{
					map[x][y] = mapNotations.OCCUPIED;
				}
			}
		}
 
		bigFont = new Font("Arial", Font.BOLD, 16);
		bigFont = new Font("Arial", Font.BOLD, 12);
		smallestFont = new Font("Calibri", Font.PLAIN, 10);
 
		gridLineColor = Color.DARK_GRAY;
		gridOccupiedColor = Color.LIGHT_GRAY;
		startColor = Color.green;
		endColor = Color.red;
		helperLineColor = Color.black;
		markerColor = Color.gray;
		textColor = Color.DARK_GRAY;
 
		bgColor = new Color(230, 231, 232);
 
		setBackground(bgColor);
 
		addMouseListener(this);
		addMouseMotionListener(this);
 
	}
 
	public void mouseEntered(MouseEvent e)
	{
		// called when the pointer enters the applet's rectangular area
	}
 
	public void mouseExited(MouseEvent e)
	{
		// called when the pointer leaves the applet's rectangular area
	}
 
	public void mouseClicked(MouseEvent e)
	{
		// called after a press and release of a mouse button
		// with no motion in between
		// (If the user presses, drags, and then releases, there will be
		// no click event generated.)
	}
 
	public void mousePressed(MouseEvent e)
	{ // called after a button is pressed down
		if (e.getButton() == MouseEvent.BUTTON1)
		{
			startX = e.getX();
			startY = e.getY();
		} else
		{
			endX = e.getX();
			endY = e.getY();
		}
 
		resetMarkedMap();
		isFreePath = isFreePath(startX, startY, endX, endY);
 
		repaint();
		// "Consume" the event so it won't be processed in the
		// default manner by the source which generated it.
		e.consume();
	}
 
	public void mouseReleased(MouseEvent e)
	{ // called after a button is released
	//        repaint();
		e.consume();
	}
 
	public void mouseMoved(MouseEvent e)
	{ // called during motion when no buttons are down
	//        repaint();
		e.consume();
	}
 
	public void mouseDragged(MouseEvent e)
	{ // called during motion with buttons down
	//        repaint();
		e.consume();
	}
 
	public void stop()
	{
	}
 
	// just some simple and rude drawing
	public void paint(Graphics g)
	{
		g.setColor(gridLineColor);
		for (int x = 0; x < MAP_SIZE + 1; x++)
			g.drawLine(x * GRID_PAINT_SIZE, 0, x * GRID_PAINT_SIZE, MAP_SIZE
					* GRID_PAINT_SIZE);
 
		for (int y = 0; y < MAP_SIZE + 1; y++)
			g.drawLine(0, y * GRID_PAINT_SIZE, MAP_SIZE * GRID_PAINT_SIZE, y
					* GRID_PAINT_SIZE);
 
		g.setColor(gridOccupiedColor);
		for (int x = 0; x < MAP_SIZE; x++)
		{
			for (int y = 0; y < MAP_SIZE; y++)
			{
				if (map[x][y] == mapNotations.OCCUPIED)
				{
					g.fillRect(x * GRID_PAINT_SIZE + 1,
							y * GRID_PAINT_SIZE + 1, GRID_PAINT_SIZE - 1,
							GRID_PAINT_SIZE - 1);
				}
			}
		}
 
		g.setColor(textColor);
 
		g.setFont(smallestFont);
 
		for (int x = 0; x < MAP_SIZE; x++)
		{
			for (int y = 0; y < MAP_SIZE; y++)
			{
				g.drawString("(" + x + "," + y + ")", x * GRID_PAINT_SIZE + 2,
						y * GRID_PAINT_SIZE + 10);
			}
		}
 
		g.setFont(bigFont);
		g.drawString("Start: (" + startX + "," + startY + ")", 16, MAP_SIZE
				* GRID_PAINT_SIZE + 30 + 10);
		g.drawString("End: (" + endX + "," + endY + ")", 16, MAP_SIZE
				* GRID_PAINT_SIZE + 48 + 10);
		g.drawString("Is free path: " + isFreePath, 16, MAP_SIZE
				* GRID_PAINT_SIZE + 48 + 18 + 10);
		g.setFont(smallFont);
		g.drawString("Start (map coords): (" + sx + "," + sy + ")", 16,
				MAP_SIZE * GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 0 * 14);
		g.drawString("End (map coords): (" + endX1 + "," + endY1 + ")", 16,
				MAP_SIZE * GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 1 * 14);
		g.drawString("Step (map coords): (" + stepX + "," + stepY + ")", 16,
				MAP_SIZE * GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 2 * 14);
		g.drawString("Dir: (" + dirX + "," + dirY + ")", 16, MAP_SIZE
				* GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 3 * 14);
		g.drawString("tDelta: (" + tDeltaX + "," + tDeltaY + ")", 16, MAP_SIZE
				* GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 4 * 14);
		g.drawString("sTMax: (" + sTMaxX + "," + sTMaxY + ")", 16, MAP_SIZE
				* GRID_PAINT_SIZE + 10 + 48 + 18 + 18 + 5 * 14);
 
		g.setColor(markerColor);
 
		for (int x = 0; x < MAP_SIZE; x++)
		{
			for (int y = 0; y < MAP_SIZE; y++)
			{
				if (markedMap[x][y] == 1)
				{
					g.fillOval(x * GRID_PAINT_SIZE - mapMarkerDiameter / 2
							+ GRID_PAINT_SIZE / 2, y * GRID_PAINT_SIZE
							- mapMarkerDiameter / 2 + GRID_PAINT_SIZE / 2,
							mapMarkerDiameter, mapMarkerDiameter);
				}
			}
		}
 
		g.setColor(startColor);
		// draw start and end markers
		g.drawOval(startX - markerDiameter / 2, startY - markerDiameter / 2,
				markerDiameter, markerDiameter);
 
		g.setColor(endColor);
		// draw start and end markers
		g.drawOval(endX - markerDiameter / 2, endY - markerDiameter / 2,
				markerDiameter, markerDiameter);
 
		g.setColor(helperLineColor);
		g.drawLine(startX, startY, endX, endY);
	}
 
	public void resetMarkedMap()
	{
		for (int x = 0; x < MAP_SIZE; x++)
		{
			for (int y = 0; y < MAP_SIZE; y++)
			{
				markedMap[x][y] = 0;
			}
		}
	}
 
	int sx, sy, x, y, endX1, endY1, stepX, stepY;
	float tDeltaX, tDeltaY;
	float dirX, dirY;
	float sTMaxX, sTMaxY;
 
	public boolean isFreePath(float startX, float startZ, float endX, float endZ)
	{
		// calculate the direction of the ray (linear algebra)
		dirX = (endX - startX);
		dirY = (endZ - startZ);
		float length = (float) Math.sqrt(dirX * dirX + dirY * dirY);
		dirX /= length; // normalize the direction vector
		dirY /= length;
 
		tDeltaX = MAP_RESOLUTION / Math.abs(dirX); // how far we must move in the ray direction before we encounter a new voxel in x-direction
		tDeltaY = MAP_RESOLUTION / Math.abs(dirY); // same but y-direction
 
		// start voxel coordinates
		sx = (int) (worldCoordToMapCoord(startX));  // use your transformer function here
		sy = (int) (worldCoordToMapCoord(startZ));
 
		// end voxel coordinates
		endX1 = (int) (worldCoordToMapCoord(endX));
		endY1 = (int) (worldCoordToMapCoord(endZ));
 
		x = sx;
		y = sy;
 
		// decide which direction to start walking in
		stepX = (int) Math.signum(dirX);
		stepY = (int) Math.signum(dirY);
 
		float tMaxX, tMaxY;
		// calculate distance to first intersection in the voxel we start from
		if(dirX < 0)
		{
			tMaxX = (mapToWorld(x)-startX) / dirX;
		}else
		{
			tMaxX = (mapToWorld(x+1)-startX) / dirX;
		}
 
		if(dirY < 0)
		{
			tMaxY = (mapToWorld(y)-startY) / dirY;
		}else
		{
			tMaxY = (mapToWorld(y+1)-startY) / dirY;
		}
 
		sTMaxX = tMaxX;
		sTMaxY = tMaxY;
 
		// check if first is occupied
		if (isMapPositionOccupied(x, y)) // use your function here
			return false;
 
		boolean reachedX = false, reachedY = false;
 
		while (true)
		{
			if (tMaxX < tMaxY) 			{ 				tMaxX += tDeltaX; 				x += stepX; 			} else 			{ 				tMaxY += tDeltaY; 				y += stepY; 			} 			if (stepX > 0.0f)
			{
				if (x >= endX1)
				{
					reachedX = true;
				}
			} else if (x <= endX1) 			{ 				reachedX = true; 			} 			if (stepY > 0.0f)
			{
				if (y >= endY1)
				{
					reachedY = true;
				}
			} else if (y <= endY1)
			{
				reachedY = true;
			}
 
			if (isMapPositionOccupied(x, y))
			{
				return false;
			}
 
			if (reachedX && reachedY)
			{
				break;
			}
		}
 
		return true;
	}
 
	private float mapToWorld(int coord)
	{
		return coord*MAP_RESOLUTION;
	}
 
	private float worldCoordToMapCoord(float coord)
	{
		return coord / MAP_RESOLUTION;
	}
 
	private boolean isMapPositionOccupied(int x, int y)
	{
		if(x < 0 || y < 0)
			return true; 
 
		// save this for debug use
		markedMap[x][y] = 1;
 
		return map[x][y] == mapNotations.OCCUPIED;
	}
}

Cull Levels

These are the different levels culling algorithms can work on.

Triangle Level

Description: Determine for each triangle if it should be culled or not.
Primary goal: Minimize triangle count.
Culling technique example: BSP 
Usage: Not used anymore, cost to much CPU.

Object Level

Description: Check each object (a group of triangles in one buffer) if they should be culled or not.
Primary goal: Minimize triangle count and keep state changes low.
Culling technique example: View Frustum Culling of Bounding Box Hierarchies
Usage: Often used.

Batch Level

Description: Will check whole batches (a group of objects in one buffer)  if they should be culled or not.
Primary goal: Minimize draw calls and triangle count.
Culling technique example:  Uniform Grid Culling
Usage: Often used.

Basic Culling Techniques

The best optimization when rendering is to not render anything unnecessary. And that is what culling is about, to find out what can be skipped when rendering because it cannot be visible anyway.  Below are the basic culling techniques which most renderers’ implements. The image is from a course slide.

Culling Techniques

Back Face Culling

Faces that faces away from the camera can not be visible on the screen so they don’t need to be drawn. This is so often used that it’s implemented by the hardware. It roughly cuts the amount of faces drawn in half. Just remember to turn it on!

View frustum Culling

The faces that is outside the view frustum can not be visible on the screen (we don’t bother about reflections now) so they can be culled. This check is done by checking if the geometries bounding volume is outside the view frustum volume or not. So the check will not be done on every face, as that would cost to much. Sometimes, the view frustum culling can cost more than what is gain ( for example when doing instancing). One way to speed up view frustum culling is to use a suitable spatial structure for the scene (octree, BSP or so).

All kind of bounding volume test that you can imagine can be found on this page:
http://www.realtimerendering.com/intersections.html

Info about frustum culling:
http://www.flipcode.com/archives/Frustum_Culling.shtml

Portal Culling

A technique that divides the scene into cells with portals between. When rendering, the camera will be in one of the rooms and that room will be rendered normally. But for each portal that is visible in the room a view frustum is set up for the size of the portal and then the room behind it is rendered. This will work recursively. The result will be that a lot of geometry can be culled by view frustum culling when rendering the other rooms. A very useful technique for indoor scenes.

Detail Culling

When a geometry is so far away that it’s not visible then there is no need to draw it at all so it can safely be culled. A more advanced scheme of detail culling that decreases the amount of details with the distance is LOD (level of detail).

Occlusion Culling

The hardest culling technique to implement. Geometry that is occluded by other geometry does not need to be rendered. One solution is to use the Z-buffer and sort the geometry in a front to back order. But this does not always work and all pixels needs to be checked against the Z-buffer so it will be costly for big scenes. Better occlusion culling techniques culls the geometry before it’s even sent to the GPU.

A good link to a couple of occlusion culling techniques
http://www.gamasutra.com/features/19991109/moller_haines_01.htm