Playing Stalker I asked myself how they managed to simulate so many light sources casting soft shadows at a decent framerate. I then read their article on deferred shading in GPU Gems 2 and thought, I can do that, too. Implementation was surprisingly easy, but the results were rather disillusioning - it ran about 5% faster than the classic multi-pass algorithm i was using before.
I soon realized that the problem was the g-buffer's setup. My naive approach was this:
3 half-float RGBA textures, storing diffuse color & specular intensity, normal & height and worldspace position, resulting in 16bit x 4 channels x 3 = 192bpp.
I could omit height as it's only needed in the initial pass (parallax mapping etc). I then read that it was possible to restore worldspace position from depth only, using the pixel's screen position and viewport information. This would not only mean that I could omit two more scalars; I could use a depth buffer to save it, so the other components of the g-buffer could be saved in fixed-point 8bit textures. Unfortunately, no paper explained this in detail :/ In the end, a professor from Erlangen University told me how to do it using a scalar saved in a floating-point texture. I modified it to work with a depth buffer, but that resulted in poor accuracy. On top of that, normals lose accuracy when saved in 8bit textures and reconstructing their z component from x/y proved to be unreliable, so I decided to stick to floating-point textures *sigh*
My current setup: 2 half-float RGBA textures, storing diffuse color & specular intensity, normal & worldspace position z, resulting in 16bit x 4 channels x 2 = 128bpp.
The performance boost is huge. But what I actually wanted to achieve is this:
2 fixed-point RGB textures, storing diffuse color, normal xy & specular intensity and 1 24bit depth buffer to reconstruct worldspace position, resulting in 8bit x 3 channels x 2 + 24bit = 72bpp!
While still far from perfect, the results are satisfying.
Aside from higher framerates, hardware skinning and effects like parallax mapping only have to be computed once per frame and my rendering pipeline is much cleaner.
Any suggestions how to improve my setup would be highly appreciated :)