This produces a stall which can introduce hundreds of cycles of latency. One key source of wasted clock cycles is when a pixel shader requires a texture value which is not readily available (not in texture cache) or the texture result hasn’t been calculated. Play book and applied it to the Pixel Shader units.
Really what they’ve done is taken a page out of the Xenos engineering team Workload into a large number of tasks or threads to be worked on by either the Texture Address Unit/Texture The Ultra-Threaded Pixel Shader Engine is an intelligent scheduler which breaks down pixel The engine is another component that stresses efficiency in ATi’s architecture by hiding latency and avoiding wasted cycles. The key to X1000’s pixel shader is their new Ultra-Threaded Pixel Shader Engine. By being “Done Right” they stress two features that perform well on their pixel shader architecture flow control and 128-bit (FP32) rendering. In ATi’s words their Shader Model 3.0 implementation is “Done Right”. Has been a major feature selling point for nVidia that ATi now equals or betters.
It’s been a long time coming and now it has finally arrived, Shader Model 3.0 support from ATi.