SSAO
This is my SSAO (Screen Space Ambient Occlusion) implementation and it’s both fast and gives good result. It’s inspired by the Crysis SSAO algorithm but also the Starcraft II implementation and a little from Nvidia’s implementation.
To use the SSAO shader, render a fullscreen quad with the SSAO shader applied to it. This shader will trace 10-16 rays for every fragment (pixel when there are no multi-sampling) on the screen. The rays are shot in a random direction in a hemisphere around the normal of the current fragment. By doing texture lookups, the depth and the normals are compared between the current fragment and the traced ones. The comparison is a simple step-formula and from the result a SSAO term can be evaluated. This SSAO term is saved to the red channel to the texture. This result must be blurred before combined with the original scene render. The blur can for example be a bilateral blur. Both the SSAO term, and the blurring can be done in a much lower resolution (half or fourth the size) than the screen to save some clock cycles. Then after blurring, one can upsample it to full screen size and get a some blur for free.
The combination with the original screen can be as simple as just multiplying this AO term with the already rendered screen.
These are the properties that worked well for me (but it’s different from setup to setup what gives best result):
uniform float totStrength = 1.38; uniform float strength = 0.07; uniform float offset = 18.0; uniform float falloff = 0.000002; uniform float rad = 0.006; |
Here’s the SSAO GLSL fragment shader:
uniform sampler2D rnm; uniform sampler2D normalMap; varying vec2 uv; uniform float totStrength; uniform float strength; uniform float offset; uniform float falloff; uniform float rad; #define SAMPLES 16 // 10 is good const float invSamples = 1.0/16.0; void main(void) { // these are the random vectors inside a unit sphere vec3 pSphere[16] = vec3[](vec3(0.53812504, 0.18565957, -0.43192),vec3(0.13790712, 0.24864247, 0.44301823),vec3(0.33715037, 0.56794053, -0.005789503),vec3(-0.6999805, -0.04511441, -0.0019965635),vec3(0.06896307, -0.15983082, -0.85477847),vec3(0.056099437, 0.006954967, -0.1843352),vec3(-0.014653638, 0.14027752, 0.0762037),vec3(0.010019933, -0.1924225, -0.034443386),vec3(-0.35775623, -0.5301969, -0.43581226),vec3(-0.3169221, 0.106360726, 0.015860917),vec3(0.010350345, -0.58698344, 0.0046293875),vec3(-0.08972908, -0.49408212, 0.3287904),vec3(0.7119986, -0.0154690035, -0.09183723),vec3(-0.053382345, 0.059675813, -0.5411899),vec3(0.035267662, -0.063188605, 0.54602677),vec3(-0.47761092, 0.2847911, -0.0271716)); //const vec3 pSphere[8] = vec3[](vec3(0.24710192, 0.6445882, 0.033550154),vec3(0.00991752, -0.21947019, 0.7196721),vec3(0.25109035, -0.1787317, -0.011580509),vec3(-0.08781511, 0.44514698, 0.56647956),vec3(-0.011737816, -0.0643377, 0.16030222),vec3(0.035941467, 0.04990871, -0.46533614),vec3(-0.058801126, 0.7347013, -0.25399926),vec3(-0.24799341, -0.022052078, -0.13399573)); //const vec3 pSphere[12] = vec3[](vec3(-0.13657719, 0.30651027, 0.16118456),vec3(-0.14714938, 0.33245975, -0.113095455),vec3(0.030659059, 0.27887347, -0.7332209),vec3(0.009913514, -0.89884496, 0.07381549),vec3(0.040318526, 0.40091, 0.6847858),vec3(0.22311053, -0.3039437, -0.19340435),vec3(0.36235332, 0.21894878, -0.05407306),vec3(-0.15198798, -0.38409665, -0.46785462),vec3(-0.013492276, -0.5345803, 0.11307949),vec3(-0.4972847, 0.037064247, -0.4381323),vec3(-0.024175806, -0.008928787, 0.17719103),vec3(0.694014, -0.122672155, 0.33098832)); //const vec3 pSphere[10] = vec3[](vec3(-0.010735935, 0.01647018, 0.0062425877),vec3(-0.06533369, 0.3647007, -0.13746321),vec3(-0.6539235, -0.016726388, -0.53000957),vec3(0.40958285, 0.0052428036, -0.5591124),vec3(-0.1465366, 0.09899267, 0.15571679),vec3(-0.44122112, -0.5458797, 0.04912532),vec3(0.03755566, -0.10961345, -0.33040273),vec3(0.019100213, 0.29652783, 0.066237666),vec3(0.8765323, 0.011236004, 0.28265962),vec3(0.29264435, -0.40794238, 0.15964167)); // grab a normal for reflecting the sample rays later on vec3 fres = normalize((texture2D(rnm,uv*offset).xyz*2.0) - vec3(1.0)); vec4 currentPixelSample = texture2D(normalMap,uv); float currentPixelDepth = currentPixelSample.a; // current fragment coords in screen space vec3 ep = vec3(uv.xy,currentPixelDepth); // get the normal of current fragment vec3 norm = currentPixelSample.xyz; float bl = 0.0; // adjust for the depth ( not shure if this is good..) float radD = rad/currentPixelDepth; vec3 ray, se, occNorm; float occluderDepth, depthDifference, normDiff; for(int i=0; i<SAMPLES;++i) { // get a vector (randomized inside of a sphere with radius 1.0) from a texture and reflect it ray = radD*reflect(pSphere[i],fres); // if the ray is outside the hemisphere then change direction se = ep + sign(dot(ray,norm) )*ray; // get the depth of the occluder fragment vec4 occluderFragment = texture2D(normalMap,se.xy); // get the normal of the occluder fragment occNorm = occluderFragment.xyz; // if depthDifference is negative = occluder is behind current fragment depthDifference = currentPixelDepth-occluderFragment.a; // calculate the difference between the normals as a weight normDiff = (1.0-dot(occNorm,norm)); // the falloff equation, starts at falloff and is kind of 1/x^2 falling bl += step(falloff,depthDifference)*normDiff*(1.0-smoothstep(falloff,strength,depthDifference)); } // output the result float ao = 1.0-totStrength*bl*invSamples; gl_FragColor.r = ao; } |
This is an optimized version of the same shader, but a little harder to read and understand
uniform sampler2D rnm; uniform sampler2D normalMap; varying vec2 uv; const float totStrength = 1.38; const float strength = 0.07; const float offset = 18.0; const float falloff = 0.000002; const float rad = 0.006; #define SAMPLES 10 // 10 is good const float invSamples = -1.38/10.0; void main(void) { // these are the random vectors inside a unit sphere vec3 pSphere[10] = vec3[](vec3(-0.010735935, 0.01647018, 0.0062425877),vec3(-0.06533369, 0.3647007, -0.13746321),vec3(-0.6539235, -0.016726388, -0.53000957),vec3(0.40958285, 0.0052428036, -0.5591124),vec3(-0.1465366, 0.09899267, 0.15571679),vec3(-0.44122112, -0.5458797, 0.04912532),vec3(0.03755566, -0.10961345, -0.33040273),vec3(0.019100213, 0.29652783, 0.066237666),vec3(0.8765323, 0.011236004, 0.28265962),vec3(0.29264435, -0.40794238, 0.15964167)); // grab a normal for reflecting the sample rays later on vec3 fres = normalize((texture2D(rnm,uv*offset).xyz*2.0) - vec3(1.0)); vec4 currentPixelSample = texture2D(normalMap,uv); float currentPixelDepth = currentPixelSample.a; // current fragment coords in screen space vec3 ep = vec3(uv.xy,currentPixelDepth); // get the normal of current fragment vec3 norm = currentPixelSample.xyz; float bl = 0.0; // adjust for the depth ( not shure if this is good..) float radD = rad/currentPixelDepth; //vec3 ray, se, occNorm; float occluderDepth, depthDifference; vec4 occluderFragment; vec3 ray; for(int i=0; i<SAMPLES;++i) { // get a vector (randomized inside of a sphere with radius 1.0) from a texture and reflect it ray = radD*reflect(pSphere[i],fres); // get the depth of the occluder fragment occluderFragment = texture2D(normalMap,ep.xy + sign(dot(ray,norm) )*ray.xy); // if depthDifference is negative = occluder is behind current fragment depthDifference = currentPixelDepth-occluderFragment.a; // calculate the difference between the normals as a weight // the falloff equation, starts at falloff and is kind of 1/x^2 falling bl += step(falloff,depthDifference)*(1.0-dot(occluderFragment.xyz,norm))*(1.0-smoothstep(falloff,strength,depthDifference)); } // output the result gl_FragColor.r = 1.0+bl*invSamples; } |
To use the SSAO effect. Render a fullscreen quad over the screen with the following vertex shader and the previous SSAO fragment shader.
varying vec2 uv; void main(void) { gl_Position = ftransform(); gl_Position = sign( gl_Position ); // Texture coordinate for screen aligned (in correct range): uv = (vec2( gl_Position.x, - gl_Position.y ) + vec2( 1.0 ) ) * 0.5; } |
Here’s the source for a school project we did in six weeks (halftime work). It’s a simple FPS game, but does demonstrate the SSAO. Use the “O”-button to toggle the different SSAO modes. This source is unfortunately hard to build since it uses a lot of third party libraries (Ogre3D, boost, fmod, ode …)
http://sourceforge.net/svn/?group_id=244295
And here’s the compiled version. Just unzip it and run the exe.
18 Comments »
RSS feed for comments on this post. TrackBack URI
























Fantastic! was looking forward to when you would do this one
Thanks for all the good comments and hard work, hope i can learn from this.
Yes, I will maybe put up a demo. Got one in c++ and one in Java.
Really good implementation!
Yes a release of a C++ demo with source would be superb I would love to see this running on my computer. Excellent work and website!
I can’t get it to work correctly, it’s possible my normals are transformed to to wrong space, also I’m not sure what it means to scale the linear depth from 0 to 1.
Thanks
If you use RenderMonkey, you can see how the different passes looks like. Very good for debugging
The only problem with it is that it doesn’t handle multiple render targets or float textures very well.
Hi! You’re work looks really amazing. I’m starting to read the same documents to implement SSAOO too. I’ve some doubts about your code. First of all, let me say I’m used to work with HLSL. I never did something with GLSL (just to post some background).
About the normals : When you say you render them in view space, you mean multiplied by the WorldView, or by the WorlViewProjection. Then, when you read the normals from your normal map, why you don’t resize them from (0,1) to (-1,1) (That’s maybe some trick with GLSL?)
I guess you’re working all the time in screen space, never in view space. Correct me if I’m wrong. Then, what this means (varying vec2 uv)? It’s that the view direction vector interpolated?
And here I get lost about reading directly from screen space.
// current fragment coords in screen space
vec3 ep = vec3(uv.xy,currentPixelDepth);
// get the depth of the occluder fragment
vec4 occluderFragment = texture2D(normalMap,se.xy);
Sorry if my post is so long, but this info will be helpful. Thanks in advance!
About the normals, I read another time the post and I see you said in screen space
Oh, sorry I didn’t add the vertex shader to this post. That will show you what you asked for.
Hi! Thanks for sharing these helpful codes.
The second shader you posted is the optimized version, but the only difference from the first version I can see is that you squeezed some math expressions together into function parameters, so saving the use of several variables. Is there other optimizations I missed?
I translated both versions into HLSL and found out the optimization I mentioned above does not decrease instruction slot number in the assembly codes, and both version renders very slowly (~12fps on my nVidia Quadro NVS 210S). Is it because my card is really that bad, or because I didn’t translate the codes correctly? Some performance benchmark information would be great. Thanks again!
PS.
My implementation in HLSL ended up in an assembly shader of as many as 346 instructions when sample ray number is 16. What’s your number?
– Then, when you read the normals from your normal map, why you don’t resize them from (0,1) to (-1,1) (That’s maybe some trick with GLSL?)
I don’t need to resize them since I save the normals in a float texture.
I haven’t checked since I was more concerned about the real performance instead of number of instructions when doing this shader.
I’m not sure I understand how “the combination with the original screen can be as simple as just multiplying this AO term with the already rendered screen”. Presumably this applies only if the scene uses no direct lighting?
First of all, SSAO is a crude approximation so you’re free to do whatever you want. But in real-life, there can be ambient occlusion even on a direct lit surface.
The SSAO is not working for me when I press “O”, also I can’t find this shader in SVN revision 120
Hi, I don’t understand this line?
for(int i=0; i<SAMPLES;++i)
whats means i<?
Thanks
You do convert normals from [0, 1] range to [-1, 1] range (both fres and norm) but you do not convert occNorm. This causes some strange artifacts. Possible fix:
occNorm = (occluderFragment.xyz * 2.0) – vec3(1.0);