Visucore Blog

Python, Parallel Processing, Graphics, Open Source, Embedded Systems, Game Development

Disabling comments

written by wladimir, on Nov 18, 2014 12:06:00 PM.

I'm sorry it had to come to this. I'm disabling comments on this blog. I value your input, however the sheer volume of spam (woke up to 5281 spam comments) would make it into a dayjob for me to moderate this. If you have comments on any of my blog entries, feel free to contact me by mail.

(I suppose some proof-of-work or even Bitcoin-based scheme could work to weed out scammers, but alas I don't have time to work on that)

Back online

written by wladimir, on Nov 18, 2014 11:56:34 AM.

There were some issues as the VPS host for this server went out of business - and with them, the server just disappeared. My backup discipline saved the day. Just a reminder: if you store your information in the cloud don't rely on the cloud backing it up.

Etnaviv on GC2000

written by wladimir, on Nov 20, 2013 7:53:00 AM.

Thanks to austriancoder we now have something showing up on GC2000. There are still some visual corruption issues, but something is showing up!

Etnaviv: Help needed

written by wladimir, on Oct 8, 2013 8:29:00 AM.

Nearly all of the figuring out has been done, and we have an OpenGL ES driver that works well on fbdev on GC1000 and below. It is being used in the GCW Zero handheld game console, succesfully rendering many games (and then I'm still in the process of squashing rendering bugs all over the place :-). However I don't have time nor will to do everything myself. This project needs developers that help with

  • GC2000 support in the gallium driver: multiple pixel pipes support is the only thing holding back basic GC2000 support
  • Integrating the Mesa stuff into DRI/DRM
  • Preparations for upstreaming the Mesa driver
  • X11 2D driver

I read all the time "I can't use etnaviv because it doesn't ..." should not forget that this is an open source collaborative project. I did my thing, now do yours. There is no point in waiting because whatever you want just won't happen out of itself.

Not knowing anything about low level graphics programming is not an excuse, because the most difficult part (figuring out the HW and writing the 3D driver) has been done. It's a good learning experience, too, and lots of fun.

FOSS Vivante driver support is looking forward to your help! Join #etnaviv on or mail me if you have questions.

VPU proof of concept Ingenic JZ4770

written by wladimir, on Sep 27, 2013 5:22:00 PM.

Lately I've tried to get to the second (AUX) core of the Ingenic JZ4770 in the GCW Zero. This is part of the VPU (Video Processing Unit) and not really documented, so this was the result of quite some trial and error. But after clocking down the AHB1 bus to 166MHz I was suddenly able to reliably run code on the extra core. The interesting thing about the VPU in the JZ4770 is that it simply runs MIPS code like the main core (albeit at half clock rate) and not another "secret" ISA.

The repository contains source for the tests in src/tests, the code for the AUX core is in src/firmware. Build instructions are provided in in the repository.

Here comes an overview of the current test cases.


The first proof of concept for running code on the AUX core. Firmware is loaded an executed that updates values at specified locations in memory. The main core polls for this change and displays the result.

Loading code...
Firmware size: 112
Executing code...
Result: 87654321
TCSM0 87654321 87654322 87654323 87654324 87654325 87654326 87654327 87654328 87654329 8765432a 8765432b 8765432c 8765432d 8765432e 8765432f 87654330 


Shows a moving pattern written by VPU directly to the framebuffer. It appears that the VPU can read and write directly from and to any physical address.


Shows using an interrupt for completion notification. This test does a memory benchmark (details of the memory area can be configured in test4_p1.h).

Allocated physical memory buffer at 0eca0000
Loading code...
Firmware size: 124
Executing code...
Completion token: deadcafe
Elapsed time: 4.00s
Total writes: 400.00MB
Write rate: 99.92MB/s


The VPU has three DMA engines: GP0 GP1 and GP2. In my experiments it appears that only the third (GP2) can be used to copy to main memory. The DMA engines can copy 2D surfaces which means that the length of a row and the source and destination stride can be supplied. They take a task list, so multiple commands can be queued at the same time.

This test case generates a 64x64 pattern in SRAM, then creates a task list that tiles it to the framebuffer with a single DMA invocation. Output:

Allocated physical memory buffer at 0eca0000
Loading code...
Firmware size: 496
Executing code...
Bytes 0e70 Task 11 End=0
Bytes 0f88 Task 14 End=0
Bytes 0f60 Task 15 End=0
Bytes 0e98 Task 16 End=0
Bytes 0e20 Task 17 End=0
Bytes 0d30 Task 18 End=0
Bytes 0c40 Task 19 End=0
Bytes 0000 Task 19 End=1
Completion token: b01dface
Elapsed time: 0.00s

With these capabilities it looks like the VPU could be used for offloading some tasks in games, even ignoring the video-specific hardware blocks. A developer-friendly framework around it would be useful, but for now I don't have time to do anything beyond these basic proof of concepts. If you'd like to use this or pick up from here and have any questions, let me know.

Etna utility update: viv_gpu_top, viv_throughput

written by wladimir, on Sep 19, 2013 3:09:00 PM.

I've just pushed an update for the etna utilities. viv_gpu_top was extended with as much as two modes, one to watch occupancy (non-idle state) of the various modules, and one to watch the DMA hardware status. I also added an utility viv_throughput to benchmark the raw fillrate of the GPU.


New mode viv_gpu_top -md (looks like showterm has some problems with the screen updates, filed an github issue for it). This samples the state of the DMA engine a certain number of times per second and displays statistics:

And new mode viv_gpu_top -mo while running glquake. This makes it clear that none of the modules (except FE which is always at 100% unless power saving kicks in) is fully occupied while running the game, which means that there is need for CPU optimization:


This one is pretty straightforward and renders off-screen quads of a specified size and with specified settings to determine the fillrate. It records the time spent rendering as well as various performance counters such as the number of stalls.

  ./viv_throughput [-w <width>] [-h <height>] [-l <0/1>] [-s <0/1>] [-t <0/1>] [-e <0/1>] [-f <frames>] [-d <0/16/32>] [-c <16/32>]

  -w <width>    Width of surface (default is 1920)
  -h <height>   Height of surface (default is 1080)
  -l <0/1>      Clear surface every frame (0=no, 1=yes, default is 0)
  -s <0/1>      Use supertile layout (0=no, 1=yes, default is 0)
  -t <0/1>      Enable TS (0=no, 1=yes, default is 1)
  -e <0/1>      Enable early Z (0=no, 1=yes, default is 0)
  -f <frames>   Number of frames to render (default is 2000)
  -d <0/16/32>  Depth/stencil surface depth
  -c <16/32>    Color surface depth

For example, to benchmark with 32 bit color and no depth/stencil:

# ./viv_throughput -c 32 -d 0 -f 150
  Frame: 1920 x 1080
  Color format: PIPE_FORMAT_B8G8R8X8_UNORM
  Depth format: PIPE_FORMAT_NONE
  Supertiled: 0
  Enable TS: 1
  Early z: 0
  Do clear: 0
  Num frames: 150
  Frame size: 8.3 MB
  Elapsed time: 1.26s
  FPS: 119.2
  Fillrate: 988.9 MB/s
  Vertices rendered: 600
  Pixels rendered: 311040000
  VS instructions: 1200
  PS instructions: 311472000
  Read: 0.1 MB/frame
  Written: 8.4 MB/frame
  Stalls on read: 0.0M/frame
  Stalls on write request: 0.0M/frame
  Stalls on write data: 0.0M/frame

And to benchmark with 32 bit color and 32 bit depth/stencil:

# ./viv_throughput -c 32 -d 32 -f 150
  Frame: 1920 x 1080
  Color format: PIPE_FORMAT_B8G8R8X8_UNORM
  Depth format: PIPE_FORMAT_S8_UINT_Z24_UNORM
  Supertiled: 0
  Enable TS: 1
  Early z: 0
  Do clear: 0
  Num frames: 150
  Frame size: 16.6 MB
  Elapsed time: 5.67s
  FPS: 26.5
  Fillrate: 438.9 MB/s
  Vertices rendered: 600
  Pixels rendered: 311040000
  VS instructions: 1200
  PS instructions: 311472000
  Read: 8.5 MB/frame
  Written: 16.8 MB/frame
  Stalls on read: 2.0M/frame
  Stalls on write request: 3.8M/frame
  Stalls on write data: 1.6M/frame

It's clear that a lot of stalls are being generated when depth is enabled on the GC860 in JZ4770. The additional memory bandwidth for reads cannot fully explain the drop in fillrate.

Q3A with Etna OpenGL ES driver

written by wladimir, on Sep 13, 2013 12:24:00 PM.

Quake 3 Arena rendered with Etna OpenGL ES driver on the GCW Zero. Nice video. Props to qbertaddict.

Etna utilities

written by wladimir, on Sep 11, 2013 6:00:00 PM.

As you may have noticed I recently pushed a new directory utils to the etna_viv source repository. This directory contains various utilities related to the GPU and driver.

Some of these utilities are mostly useful for debugging the driver itself, others are also useful for optimization of applications using the driver. An overview follows.


This utility provides a live view of the rate of change of the performance counters (profiling information). This is arguably the most useful of the bunch.

Here is an example while getting my ass kicked by the AIs in Quake 3 (update frequency: 1/s, action starts after some time due to loading delays):

The exact meanings of the various performance counters are not publicly documented AFAIK, although a lot can be guessed from the names alone.


viv_info shows the feature bits of all the cores of the GPU. These bits signify to the driver which rendering features are available. This is used to fill in the feature matrix.

Example output for GC860 (terminal can be scrolled with mouse wheel):


This utility shows a live view of all GPU debug registers.

Important: Needs kernel driver compiled with user space register access (gcdREGISTER_ACCESS_FROM_USER=1). This is the case by default with most kernel drivers I've encountered in the wild.

Here is an example while bunny-hopping through E1M1 of Quake 1 (update frequency: 1/s).

In general viv_gpu_top provides a more useful overview. However, the difference is that this tool shows all the debug registers, not just the performance counters returned from the kernel.


viv_registers shows the current state of the GPU mmio registers.

Important: Needs kernel driver compiled with user space register access (gcdREGISTER_ACCESS_FROM_USER=1).

Warning: this utility can result in crashes inside the kernel such as (on ARM),

Unhandled fault: external abort on non-linefetch (0x1028) at 0xfe641000
Internal error: : 1028 [#1] PREEMPT ARM

It appears that the actually accessible registers differ per SoC. When a non-accessible register is loaded, a fault happens. So expect crashes when using this utility.

Example (terminal can be scrolled using mouse wheel):


As you may have guessed from the name, this command resets the GPU. This should be useful if some erroneous input managed to hang it.

Note: this is known to be unreliable with many kernel drivers and can bring the GPU in a state that can only be recovered with a device reboot.

No screencast for this one as it has no output.

MSAA working!

written by wladimir, on Sep 10, 2013 2:04:00 PM.

All the MSAA (Multi-sampling anti-aliasing) crash bugs appear to have been resolved, and every game I tried it with works! As the GCW Zero has a 320 by 240 screen, it is a prime candidate for anti-aliasing.

quake with MSAA 1x

quake with MSAA 2x

quake with MSAA 4x

In case a game doesn't have a configuration option to set MSAA, it can be forced through debug flags in the following way

# Force MSAA 2x
export ETNA_DEBUG="msaa2x"
# Force MSAA 4x
export ETNA_DEBUG="msaa4x"

There is a small penalty to the frame rate when enabling MSAA, but as most of the (older) games on GCW appear to be vertex or CPU limited, it is acceptable.

More etna_viv news

written by wladimir, on Aug 28, 2013 8:09:00 PM.

Another update on the Etna 3D driver for Vivante GPU cores, along with screenshots of games that now render successfully :0)

Currently supported GPUs: GC600, GC800, GC860, GC880 (others may be supported but these have been tested). GC2000 is currently not supported because it has multiple pixel pipes (see this irc log for details).


What has been done

  • GLES1 (for the most part) and GLES2 support
  • Shader compiler, with support for fixed pipeline emulation shaders from GLES1
  • Buffer management, 2D and cubemap textures, mipmap generation
  • Fallbacks in Mesa for the devices that only supports single vertex buffer or no 32-bit indices, and lowering for TGSI instructions LRP and POW

D2X (Descent 2 rebirth)

What has to be done (in no particular order)

  • Bugfixes for remaining corruption issues
  • Optimization: mainly a smarter shader compiler, and implement performance features present in the blob driver but currently disabled in etnaviv
  • An Xorg EXA (2D) driver
  • Interaction with dma buffers and drm (kernel code for interaction between vivante kernel driver and dma buffers is supposed to exist somewhere, at least for Marvell Dove)
  • Get stuff merged upstream


Any help is welcome (props to Zear again for the screenshots).