Etna utility update: viv_gpu_top, viv_throughput

I've just pushed an update for the etna utilities. viv_gpu_top was extended with as much as two modes, one to watch occupancy (non-idle state) of the various modules, and one to watch the DMA hardware status. I also added an utility viv_throughput to benchmark the raw fillrate of the GPU.

viv_gpu_top

New mode viv_gpu_top -md (looks like showterm has some problems with the screen updates, filed an github issue for it). This samples the state of the DMA engine a certain number of times per second and displays statistics:

And new mode viv_gpu_top -mo while running glquake. This makes it clear that none of the modules (except FE which is always at 100% unless power saving kicks in) is fully occupied while running the game, which means that there is need for CPU optimization:

viv_throughput

This one is pretty straightforward and renders off-screen quads of a specified size and with specified settings to determine the fillrate. It records the time spent rendering as well as various performance counters such as the number of stalls.

Usage:
  ./viv_throughput [-w ] [-h ] [-l <0/1>] [-s <0/1>] [-t <0/1>] [-e <0/1>] [-f ] [-d <0/16/32>] [-c <16/32>]

  -w     Width of surface (default is 1920)
  -h    Height of surface (default is 1080)
  -l <0/1>      Clear surface every frame (0=no, 1=yes, default is 0)
  -s <0/1>      Use supertile layout (0=no, 1=yes, default is 0)
  -t <0/1>      Enable TS (0=no, 1=yes, default is 1)
  -e <0/1>      Enable early Z (0=no, 1=yes, default is 0)
  -f    Number of frames to render (default is 2000)
  -d <0/16/32>  Depth/stencil surface depth
  -c <16/32>    Color surface depth

For example, to benchmark with 32 bit color and no depth/stencil:

# ./viv_throughput -c 32 -d 0 -f 150
...
Input
  Frame: 1920 x 1080
  Color format: PIPE_FORMAT_B8G8R8X8_UNORM
  Depth format: PIPE_FORMAT_NONE
  Supertiled: 0
  Enable TS: 1
  Early z: 0
  Do clear: 0
  Num frames: 150
  Frame size: 8.3 MB
Statistics:
  Elapsed time: 1.26s
  FPS: 119.2
  Fillrate: 988.9 MB/s
  Vertices rendered: 600
  Pixels rendered: 311040000
  VS instructions: 1200
  PS instructions: 311472000
  Read: 0.1 MB/frame
  Written: 8.4 MB/frame
  Stalls on read: 0.0M/frame
  Stalls on write request: 0.0M/frame
  Stalls on write data: 0.0M/frame

And to benchmark with 32 bit color and 32 bit depth/stencil:

# ./viv_throughput -c 32 -d 32 -f 150
...
Input
  Frame: 1920 x 1080
  Color format: PIPE_FORMAT_B8G8R8X8_UNORM
  Depth format: PIPE_FORMAT_S8_UINT_Z24_UNORM
  Supertiled: 0
  Enable TS: 1
  Early z: 0
  Do clear: 0
  Num frames: 150
  Frame size: 16.6 MB
Statistics:
  Elapsed time: 5.67s
  FPS: 26.5
  Fillrate: 438.9 MB/s
  Vertices rendered: 600
  Pixels rendered: 311040000
  VS instructions: 1200
  PS instructions: 311472000
  Read: 8.5 MB/frame
  Written: 16.8 MB/frame
  Stalls on read: 2.0M/frame
  Stalls on write request: 3.8M/frame
  Stalls on write data: 1.6M/frame

It's clear that a lot of stalls are being generated when depth is enabled on the GC860 in JZ4770. The additional memory bandwidth for reads cannot fully explain the drop in fillrate.

Written on September 19, 2013
Tags:
Filed under