This is my SSAO (Screen Space Ambient Occlusion) implementation and it’s both fast and gives good result. It’s inspired by the Crysis SSAO algorithm but also the Starcraft II implementation and a little from Nvidia’s implementation.

To use the SSAO shader, render a fullscreen quad with the SSAO shader applied to it. This shader will trace 10-16 rays for every fragment (pixel when there are no multi-sampling) on the screen. The rays are shot in a random direction in a hemisphere around the normal of the current fragment. By doing texture lookups, the depth and the normals are compared between the current fragment and the traced ones. The comparison is a simple step-formula and from the result a SSAO term can be evaluated. This SSAO term is saved to the red channel to the texture. This result must be blurred before combined with the original scene render. The blur can for example be a bilateral blur. Both the SSAO term, and the blurring can be done in a much lower resolution (half or fourth the size) than the screen to save some clock cycles. Then after blurring, one can upsample it to full screen size and get a some blur for free.

The combination with the original screen can be as simple as just multiplying this AO term with the already rendered screen.

These are the properties that worked well for me (but it’s different from setup to setup what gives best result):

uniform float totStrength = 1.38; uniform float strength = 0.07; uniform float offset = 18.0; uniform float falloff = 0.000002; uniform float rad = 0.006; |

Here’s the SSAO GLSL fragment shader:

uniform sampler2D rnm; uniform sampler2D normalMap; varying vec2 uv; uniform float totStrength; uniform float strength; uniform float offset; uniform float falloff; uniform float rad; #define SAMPLES 16 // 10 is good const float invSamples = 1.0/16.0; void main(void) { // these are the random vectors inside a unit sphere vec3 pSphere[16] = vec3[](vec3(0.53812504, 0.18565957, -0.43192),vec3(0.13790712, 0.24864247, 0.44301823),vec3(0.33715037, 0.56794053, -0.005789503),vec3(-0.6999805, -0.04511441, -0.0019965635),vec3(0.06896307, -0.15983082, -0.85477847),vec3(0.056099437, 0.006954967, -0.1843352),vec3(-0.014653638, 0.14027752, 0.0762037),vec3(0.010019933, -0.1924225, -0.034443386),vec3(-0.35775623, -0.5301969, -0.43581226),vec3(-0.3169221, 0.106360726, 0.015860917),vec3(0.010350345, -0.58698344, 0.0046293875),vec3(-0.08972908, -0.49408212, 0.3287904),vec3(0.7119986, -0.0154690035, -0.09183723),vec3(-0.053382345, 0.059675813, -0.5411899),vec3(0.035267662, -0.063188605, 0.54602677),vec3(-0.47761092, 0.2847911, -0.0271716)); //const vec3 pSphere[8] = vec3[](vec3(0.24710192, 0.6445882, 0.033550154),vec3(0.00991752, -0.21947019, 0.7196721),vec3(0.25109035, -0.1787317, -0.011580509),vec3(-0.08781511, 0.44514698, 0.56647956),vec3(-0.011737816, -0.0643377, 0.16030222),vec3(0.035941467, 0.04990871, -0.46533614),vec3(-0.058801126, 0.7347013, -0.25399926),vec3(-0.24799341, -0.022052078, -0.13399573)); //const vec3 pSphere[12] = vec3[](vec3(-0.13657719, 0.30651027, 0.16118456),vec3(-0.14714938, 0.33245975, -0.113095455),vec3(0.030659059, 0.27887347, -0.7332209),vec3(0.009913514, -0.89884496, 0.07381549),vec3(0.040318526, 0.40091, 0.6847858),vec3(0.22311053, -0.3039437, -0.19340435),vec3(0.36235332, 0.21894878, -0.05407306),vec3(-0.15198798, -0.38409665, -0.46785462),vec3(-0.013492276, -0.5345803, 0.11307949),vec3(-0.4972847, 0.037064247, -0.4381323),vec3(-0.024175806, -0.008928787, 0.17719103),vec3(0.694014, -0.122672155, 0.33098832)); //const vec3 pSphere[10] = vec3[](vec3(-0.010735935, 0.01647018, 0.0062425877),vec3(-0.06533369, 0.3647007, -0.13746321),vec3(-0.6539235, -0.016726388, -0.53000957),vec3(0.40958285, 0.0052428036, -0.5591124),vec3(-0.1465366, 0.09899267, 0.15571679),vec3(-0.44122112, -0.5458797, 0.04912532),vec3(0.03755566, -0.10961345, -0.33040273),vec3(0.019100213, 0.29652783, 0.066237666),vec3(0.8765323, 0.011236004, 0.28265962),vec3(0.29264435, -0.40794238, 0.15964167)); // grab a normal for reflecting the sample rays later on vec3 fres = normalize((texture2D(rnm,uv*offset).xyz*2.0) - vec3(1.0)); vec4 currentPixelSample = texture2D(normalMap,uv); float currentPixelDepth = currentPixelSample.a; // current fragment coords in screen space vec3 ep = vec3(uv.xy,currentPixelDepth); // get the normal of current fragment vec3 norm = currentPixelSample.xyz; float bl = 0.0; // adjust for the depth ( not shure if this is good..) float radD = rad/currentPixelDepth; vec3 ray, se, occNorm; float occluderDepth, depthDifference, normDiff; for(int i=0; i<SAMPLES;++i) { // get a vector (randomized inside of a sphere with radius 1.0) from a texture and reflect it ray = radD*reflect(pSphere[i],fres); // if the ray is outside the hemisphere then change direction se = ep + sign(dot(ray,norm) )*ray; // get the depth of the occluder fragment vec4 occluderFragment = texture2D(normalMap,se.xy); // get the normal of the occluder fragment occNorm = occluderFragment.xyz; // if depthDifference is negative = occluder is behind current fragment depthDifference = currentPixelDepth-occluderFragment.a; // calculate the difference between the normals as a weight normDiff = (1.0-dot(occNorm,norm)); // the falloff equation, starts at falloff and is kind of 1/x^2 falling bl += step(falloff,depthDifference)*normDiff*(1.0-smoothstep(falloff,strength,depthDifference)); } // output the result float ao = 1.0-totStrength*bl*invSamples; gl_FragColor.r = ao; } |

This is an optimized version of the same shader, but a little harder to read and understand

uniform sampler2D rnm; uniform sampler2D normalMap; varying vec2 uv; const float totStrength = 1.38; const float strength = 0.07; const float offset = 18.0; const float falloff = 0.000002; const float rad = 0.006; #define SAMPLES 10 // 10 is good const float invSamples = -1.38/10.0; void main(void) { // these are the random vectors inside a unit sphere vec3 pSphere[10] = vec3[](vec3(-0.010735935, 0.01647018, 0.0062425877),vec3(-0.06533369, 0.3647007, -0.13746321),vec3(-0.6539235, -0.016726388, -0.53000957),vec3(0.40958285, 0.0052428036, -0.5591124),vec3(-0.1465366, 0.09899267, 0.15571679),vec3(-0.44122112, -0.5458797, 0.04912532),vec3(0.03755566, -0.10961345, -0.33040273),vec3(0.019100213, 0.29652783, 0.066237666),vec3(0.8765323, 0.011236004, 0.28265962),vec3(0.29264435, -0.40794238, 0.15964167)); // grab a normal for reflecting the sample rays later on vec3 fres = normalize((texture2D(rnm,uv*offset).xyz*2.0) - vec3(1.0)); vec4 currentPixelSample = texture2D(normalMap,uv); float currentPixelDepth = currentPixelSample.a; // current fragment coords in screen space vec3 ep = vec3(uv.xy,currentPixelDepth); // get the normal of current fragment vec3 norm = currentPixelSample.xyz; float bl = 0.0; // adjust for the depth ( not shure if this is good..) float radD = rad/currentPixelDepth; //vec3 ray, se, occNorm; float occluderDepth, depthDifference; vec4 occluderFragment; vec3 ray; for(int i=0; i<SAMPLES;++i) { // get a vector (randomized inside of a sphere with radius 1.0) from a texture and reflect it ray = radD*reflect(pSphere[i],fres); // get the depth of the occluder fragment occluderFragment = texture2D(normalMap,ep.xy + sign(dot(ray,norm) )*ray.xy); // if depthDifference is negative = occluder is behind current fragment depthDifference = currentPixelDepth-occluderFragment.a; // calculate the difference between the normals as a weight // the falloff equation, starts at falloff and is kind of 1/x^2 falling bl += step(falloff,depthDifference)*(1.0-dot(occluderFragment.xyz,norm))*(1.0-smoothstep(falloff,strength,depthDifference)); } // output the result gl_FragColor.r = 1.0+bl*invSamples; } |

To use the SSAO effect. Render a fullscreen quad over the screen with the following vertex shader and the previous SSAO fragment shader.

varying vec2 uv; void main(void) { gl_Position = ftransform(); gl_Position = sign( gl_Position ); // Texture coordinate for screen aligned (in correct range): uv = (vec2( gl_Position.x, - gl_Position.y ) + vec2( 1.0 ) ) * 0.5; } |

Here’s the source for a school project we did in six weeks (halftime work). It’s a simple FPS game, but does demonstrate the SSAO. Use the “O”-button to toggle the different SSAO modes. This source is unfortunately hard to build since it uses a lot of third party libraries (Ogre3D, boost, fmod, ode …)

http://sourceforge.net/svn/?group_id=244295

And here’s the compiled version. Just unzip it and run the exe.

Paul Usul FluegelFantastic! was looking forward to when you would do this one

Thanks for all the good comments and hard work, hope i can learn from this.

adminPost authorYes, I will maybe put up a demo. Got one in c++ and one in Java.

SebReally good implementation!

gcard28Yes a release of a C++ demo with source would be superb I would love to see this running on my computer. Excellent work and website!

AlexI can’t get it to work correctly, it’s possible my normals are transformed to to wrong space, also I’m not sure what it means to scale the linear depth from 0 to 1.

Thanks

adminPost authorIf you use RenderMonkey, you can see how the different passes looks like. Very good for debugging

The only problem with it is that it doesn’t handle multiple render targets or float textures very well.

OscarHi! You’re work looks really amazing. I’m starting to read the same documents to implement SSAOO too. I’ve some doubts about your code. First of all, let me say I’m used to work with HLSL. I never did something with GLSL (just to post some background).

About the normals : When you say you render them in view space, you mean multiplied by the WorldView, or by the WorlViewProjection. Then, when you read the normals from your normal map, why you don’t resize them from (0,1) to (-1,1) (That’s maybe some trick with GLSL?)

I guess you’re working all the time in screen space, never in view space. Correct me if I’m wrong. Then, what this means (varying vec2 uv)? It’s that the view direction vector interpolated?

And here I get lost about reading directly from screen space.

// current fragment coords in screen space

vec3 ep = vec3(uv.xy,currentPixelDepth);

// get the depth of the occluder fragment

vec4 occluderFragment = texture2D(normalMap,se.xy);

Sorry if my post is so long, but this info will be helpful. Thanks in advance!

OscarAbout the normals, I read another time the post and I see you said in screen space

adminPost authorOh, sorry I didn’t add the vertex shader to this post. That will show you what you asked for.

fangHi! Thanks for sharing these helpful codes.

The second shader you posted is the optimized version, but the only difference from the first version I can see is that you squeezed some math expressions together into function parameters, so saving the use of several variables. Is there other optimizations I missed?

I translated both versions into HLSL and found out the optimization I mentioned above does not decrease instruction slot number in the assembly codes, and both version renders very slowly (~12fps on my nVidia Quadro NVS 210S). Is it because my card is really that bad, or because I didn’t translate the codes correctly? Some performance benchmark information would be great. Thanks again!

fangPS.

My implementation in HLSL ended up in an assembly shader of as many as 346 instructions when sample ray number is 16. What’s your number?

adminPost authorI haven’t checked since I was more concerned about the real performance instead of number of instructions when doing this shader.

adminPost author– Then, when you read the normals from your normal map, why you don’t resize them from (0,1) to (-1,1) (That’s maybe some trick with GLSL?)

I don’t need to resize them since I save the normals in a float texture.

keatonI’m not sure I understand how “the combination with the original screen can be as simple as just multiplying this AO term with the already rendered screen”. Presumably this applies only if the scene uses no direct lighting?

adminPost authorFirst of all, SSAO is a crude approximation so you’re free to do whatever you want. But in real-life, there can be ambient occlusion even on a direct lit surface.

davidThe SSAO is not working for me when I press “O”, also I can’t find this shader in SVN revision 120

EdgarHi, I don’t understand this line?

for(int i=0; i<SAMPLES;++i)

whats means i<?

Thanks

xahirYou do convert normals from [0, 1] range to [-1, 1] range (both fres and norm) but you do not convert occNorm. This causes some strange artifacts. Possible fix:

occNorm = (occluderFragment.xyz * 2.0) – vec3(1.0);

Rambler.JWer,I wanna say many words. However,My English is very poor. So, I just could say thanks for your sharing. I’ll often go here,I need learn from you.

AnthonyReally great help! Thanks for sharing your code. I have implemented it into a school project I am working on but get quite a bit of Moire patterning as opposed to the noise I have seen in Starcraft 2′s implementation. Any ideas where I should be looking to try to remedy this so that I can get a smoother blur? I have been reading some of the paper’s but no luck figuring it out yet.

Thanks again!

FelixTo the poster with the Moire problems that is most likely a precision problem. Make sure you are using at least 16-bit textures when you are rendering the SSAO.

Pingback: Screen Space Ambient Occlusion | AnKi 3D Engine

GioHow is the optimised version different to the first one? They look the same to me… harder to read for sure, but it seems to boil down to exactly the same instructions. Unless I’m missing something?

Pingback: Applying SSAO to scenes « Electronic Meteor

Aydin BergerReally appreciate you sharing this blog post.Thanks Again. Much obliged.

sh codeyou’ve got some pretty weak FPS there, matey

(sorry, couldn’t help it)

very good tutorial nevertheless, thank you.

mincompFirst of all, thanks for your code but sampling in world space then just use xy components to sample again in image space looks weird to me. Though the 3d vector is in the hemisphere, the xy components might mean nothing, and can be randomly distributed.

BenAll I’m seeing is red, how did you setup your framebuffer?

AlexI really enjoyed the fact that you named a vector “se” and then later get the “xy” values from it.