Page 4 of 6 - Compute shader
By this point in time, I'm sure I don't need to explain the concept of GPGPU (General Purpose GPU) computing to any of you, as we've been over it plenty of times before. Indeed, in recent times NVIDIA have been at the forefront of pushing using the GPU for more than just graphics rendering, between the acquisition of AGEIA to perform physics on the GPU through to their loud and hearty evangelism of their CUDA initiative. On the other side of the coin you have AMD and their Brook+ language for GPGPU, and several companies have put their weight behind the OpenCL project to bring a standard for coding to GPGPU.
With all of this going on, it's probably obvious to say that Microsoft doesn't want to miss out on this action, and thus DirectX 11 is the first step in a big investment by Microsoft to bring GPGPU to the API, bringing the concept of the 'Compute Shader' to Direct3D. However, first step that this is, it appears that Microsoft aren't targeting this particular iteration of DirectX as a way to take GPGPU by storm in the way that CUDA is, but rather they have continued to focus more on the gaming side of things here. It could just be the audience that they were speaking to at Gamefest (being predominantly game developers, obviously), but most of the examples given by Microsoft were of ways to use the Compute Shader to improve the gaming experience, from GPU accelerated physics and AI to image post-processing.
The reasons for using the GPU beyond simply graphics rendering are simple enough, in that the massive parallelism on a current GPU is perfect for certain computational tasks. The trick is, of course, unlocking these abilities and doing so in a manner that scales as well as possible to the number of processing cores available to any GPU be it low-end or high.
Naturally, this is where the Compute Shader comes in, integrating with the Direct3D API while also adding features that benefit GPGPU scenarios. Perhaps the most notable of these is the ability to share data between threads, something that isn't normally high on the priority list when it comes to looking at a modern graphics architecture from a purely rendering-centric perspective, but is absolutely vital here.
Also important is the ability to code GPGPU applications without having to do so as though the architecture is handling game data. In the early days of GPGPU, coding was quite a fine art as data had to be presented as textures and applications as quads of pixels to get things to run. As with the likes of CUDA, the Compute Shader in DirectX 11 will do away with this, allowing you to thread and code your application in a more normal way.
These coding changes will also be reflected in an updated version of the DirectX HLSL (High Level Shading Language) used to write shaders and DirectX code. As we can see above, it appears that the general target for DirectX 11's Compute Shader is still going to be the manipulation of graphics and media data, which as I mentioned previously suggests it won't quite be invading on CUDA territory this time around by providing a complete coding structure geared towards creating an application of any kind in HLSL.
This slide also gives us a clue as to how the Compute Shader will work, with the normal graphics pipeline being given the ability to pump out general data structures which the Compute Shader can work on as required before putting it back into the standard Direct3D pipeline as required.
This kind of ability fits perfectly into the example above, where the Compute Shader is used to perform post-processing on a completed frame of a scene in a game. In this example, the scene is rendered as per normal by Direct3D, but rather than being written to the frame buffer the finished scene is passed to the Compute Shader which performs all of its post-processing (perhaps some kind of film-grain, or for adjusting the exposure of an HDR image for each frame) before the final image is passed to the frame buffer. In this kind of scenario, post processing takes up between 10-20% of rendering time normally, or up to 70% of rendering time if your game engine makes use of deferred shading, and handling all of this in a more efficient Compute Shader could have some large benefits.
Of course, the whole point of the Compute Shader is flexibility, and these new capabilities could be the point where GPU-based physics can become the norm as any DirectX 11 hardware could potentially run such a solution without having to worry about the current state of affairs where only a single graphics IHV has GPU-accelerated physics. All this is before we start talking about the ability to do ray tracing via the Compute Shader and the like...
As per tessellation, the Compute Shader is going to be a DirectX 11-only feature, meaning that hardware support is required to make the most to it and leaving DirectX 10 and 10.1 parts incapable of using it due to some of the architectural changes required.