Etnaviv on GC2000

I finally got around to playing a bit with the GPU on my GK802. By comparing the command streams of GLES2 demos I've tried to write down the largest differences between the GC2000 and GC800, from the viewpoint of driver implementation.

  • Hierarchical depth: GC2000 supports a new mode for the depth buffer, representing the depth buffer in a hierarchical way. In this mode multiple buffers are being allocated for one depth surface. In general the purpose of hierarchical depth is to be able to reject fragments quickly by aggregating the minimum value of a tile of depth buffer values at incrementally lower resolutions. In the literature various mechanisms for hierarchical Z have been described (see for example greene93), which one is implemented by Vivante is an open question.
    • Another open question is whether non-hierarchical depth is still supported. I assume so, but we will only know for sure after some experimentation.
  • OpenCL: Overall, the shader ISA looks completely backwards compatible (the shader code generated for the same GLSL code is nearly the same). However, GC2000 has some extra instructions for integer arithmetic, bitwise ops, synchronization and memory load/stores.
  • Clip-space z: GC800 uses the DX9 convention of having clip-space z coordinates range from 0..1, GC2000 uses the GL convention of -1..1. This causes the vertex shaders to be slightly different: on GC800 the shader compiler always adds two instructions (from Using the etnaviv shader assembler):
    ; Vivante specific transform at the end of every vertex shader
    ; position_out.z = (position_out.z + position_out.w) / 2.0
    ADD t4.__z_, t4.zzzz, void, t4.wwww
    MUL t4.__z_, t4.zzzz, u11.yyyy, void 
    On GC2000 these instructions are no longer emitted. An open question is still which feature bit or other criteria marks this property. Another possibility is that the (much) newer driver uses some other way to perform this conversion without adding instructions to the VS, it will be interesting to see what code it generates for an GC800.
  • Pixel pipes: The GC2000 has two pixel pipes instead of one. This is visible to the outside world because PE and RS expose address registers per pixel pipe. The pipes both get their half of the render buffer to work with, assigned by the driver. Combining these halves into a linear or tiled buffer can be done with the RS. From the command stream:
        0x358a0000, /*   [01460] PE.PIPE[0].COLOR_ADDR := ADDR_G */
        0x358db000, /*   [01464] PE.PIPE[1].COLOR_ADDR := ADDR_Z */
        0x35845000, /*   [01480] PE.PIPE[0].DEPTH_ADDR := ADDR_I */
        0x3586e000, /*   [01484] PE.PIPE[1].DEPTH_ADDR := ADDR_26 */
        0x351d0000, /*   [016C0] RS.PIPE[0].SOURCE_ADDR := ADDR_T */
        0x351db000, /*   [016C4] RS.PIPE[1].SOURCE_ADDR := ADDR_V */
        0x351d0000, /*   [016E0] RS.PIPE[0].DEST_ADDR := ADDR_T */
        0x351db000, /*   [016E4] RS.PIPE[1].DEST_ADDR := ADDR_V */
        0x00000000, /*   [01700] RS.PIPE[0].OFFSET := X=0,Y=0 */
        0x00080000, /*   [01704] RS.PIPE[1].OFFSET := X=0,Y=8 */
    Each pixel pipe copies its own region of the size specified in register RS.WINDOWSIZE.
  • Shader instruction memory has moved to state address 0x0C000, and is shared between the vertex and fragment shader units. Registers 0x0101C PS.RANGE and 0x0085C VS.RANGE specify the range within this memory as used by either unit. Uniforms still live in the same, separate address ranges which makes me wonder how GLES3 uniform buffer objects will be implemented (they are not supported for the current driver yet), maybe through memory load instructions?
  • Multiple vertex streams: the FE grew some new registers to be able to fetch from multiple streams at the same time while rendering. The old stream address and control registers appear to be no longer used.

There are also some more subtle differences. A few new registers were added to PA for the viewport and line properties, and to SE for clipping.

All in all there are quite a few differences, of which some significant, but nothing that cannot be handled with fallbacks. Luckily it is not enough to require a completely separate driver for GC2000.

Written on March 3, 2013
Filed under