Trusting hardware, particularly the registers that describe its functionality, is fundamentally risky.

tl;dr

The etnaviv GPU stack is continuously improving and becoming more robust. This time, a hardware database was incorporated into Mesa, utilizing header files provided by the SoC vendors.

If you are interested in the implementation details, I recommend checking out this Mesa MR.

Are you employed at Versilicon and want to help? You could greatly simplify our work by supplying the community with a comprehensive header that includes all the models you offer.

Last but not least: I deeply appreciate Igalia’s passion for open source GPU driver development, and I am grateful to be a part of the team. Their enthusiasm for open source work not only pushes the boundaries of technology but also builds a strong, collaborative community around it.

The good old days

Years ago, when I began dedicating time to hacking on etnaviv, the kernel driver in use would read a handful of registers and relay the gathered information to the user space blob. This blob driver was then capable of identifying the GPU (including model, revision, etc.), supported features (such as DXT texture compression, seamless cubemaps, etc.), and crucial limits (like the number of registers, number of varyings, and so on).

For reverse engineering purposes, this interface is super useful. Image if you could change one of these feature bits on a target running the binary blob.

With libvivhook it is possible to do exactly this. From time to time, I am running such an old vendor driver stack on an i.MX 6QuadPlus SBC, which features a Vivante GC3000 as its GPU.

Somewhere, I have a collection of scripts that I utilized to acquire additional knowledge about unknown GPU states activated when a specific feature bit was set.

To explore a simple example, let’s consider the case of misrepresenting a GPU’s identity as a GC2000. This involves modifying the information provided by the kernel driver to the user space, making the user space driver believe it is interacting with a GC2000 GPU. This scenario could be used for testing, debugging, or understanding how specific features or optimizations are handled differently across GPU models.

export ETNAVIV_CHIP_MODEL="0x2000"
export ETNAVIV_CHIP_REVISION="0x5108"
export ETNAVIV_FEATURES0_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES1_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES2_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES0_SET="0xe0296cad"
export ETNAVIV_FEATURES1_SET="0xc9799eff"
export ETNAVIV_FEATURES2_SET="0x2efbf2d9"
LD_PRELOAD="/lib/viv_interpose.so" ./test-case

If you capture the generated command stream and compare it with the one produced under the correct identity, you’ll observe many differences. This is super useful - I love it.

Changing Tides: The Shift in ioctl() Interface

At some point in time, Vivante changed their ioctl() interface and modified the gcvHAL_QUERY_CHIP_IDENTITY command. Instead of providing a very detailed chip identity, they reduced the data set to the following values:

  • model
  • revision
  • product id
  • eco id
  • customer id

This shift could indeed hinder reverse engineering efforts significantly. At a glance, it becomes impossible to alter any feature value, and understanding how the vendor driver processes these values is out of reach. Determining the function or impact of an unknown feature bit now seems unattainable.

However, the kernel driver also requires a mechanism to verify the existing features of the GPU, as it needs to accommodate a wide variety of GPUs. Therefore, there must be some sort of system or method in place to ensure the kernel driver can effectively manage and support the diverse functionalities and capabilities of different GPUs.

A New Approach: The Hardware Database Dilemma

Let’s welcome: gc_feature_database.h, or hwdb for short.

Vivante transitioned to using a database that stores entries for limit values and feature bits. This database is accessed by querying with model, revision, product id, eco id and customer id.

There is some speculation why this move was done. My theory posits that they became frustrated with the recurring cycle of introducing feature bits to indicate the implementation of a feature, subsequently discovering problems with said feature, and then having to introduce additional feature bits to signal that the feature now truly operates as intended. It became far more straightforward to deactivate a malfunctioning feature by modifying information in the hardware database (hwdb). After they began utilizing the hwdb within the driver, updates to the feature registers in the hardware ceased.

Here is a concrete example of such a case that can be found in the etnaviv gallium driver:

screen->specs.tex_astc = VIV_FEATURE(screen, chipMinorFeatures4, TEXTURE_ASTC) &&
                            !VIV_FEATURE(screen, chipMinorFeatures6, NO_ASTC);

Meanwhile, in the etnaviv world there was a hybrid in the making. We stuck with the detailed feature words and found a smart way to convert from Vivante’s hwdb entries to our own in-kernel database. There is even a full blown Vivante -> etnaviv hwdb convert.

At that time, I did not fully understand all the consequences this approach would bring - more on that later. So, I dedicated my free time to reverse engineering and tweaking the user space driver, while letting the kernel developers do their thing.

About a year after the initial hwdb landed in the kernel, I thought it might be a good idea to read out the extra id values, and provide them via sysfs to the user space. At that time, I already had the idea of moving the hardware database to user space in mind. However, I was preoccupied with other priorities that were higher on my to-do list, and I ended up forgetting about it.

Challange accepted

Tomeu Vizoso began to work on teflon and a Neural Processing Unit (NPU) driver within Mesa, leveraging a significant amount of the existing codebase and concepts, including the same kernel driver for the GPU. During this process, he encountered a need for some NPU-specific limit values. To address this, he added an in-kernel hwdb entry and made the limit values accessible to user space.

That’s it — the kernel supplies all the values the NPU driver requires. We’re finished, aren’t we?

It turns out, that there are many more NPU related values that need to be exposed in the same manner, with seemingly no end in sight.

One of the major drawbacks when the hardware database (hwdb) resides in the kernel is the considerable amount of time it takes for hwdb patches to be written, reviewed, and eventually merged into Linus’s git tree. This significantly slows down the development of user space drivers. For end users, this means they must either run a bleeding-edge kernel or backport the necessary changes on their own.

For me personally, the in-kernel hardware database should never have been implemented in its current form. If I could go back in time, I would have voiced my concerns.

As a result, moving the hardware database (hwdb) to user space quickly became a top priority on my to-do list, and I began working on it. However, during the testing phase of my proof of concept (PoC), I had to pause my work due to a kernel issue that made it unreliable for user space to trust the ID values provided by the kernel. Once my fix for this issue began to be incorporated into stable kernel versions, it was time to finalize the user space hwdb.

There is only one little but important detail we have not talked about yet. There are vendor specific versions of gc_feature_database.h based on different versions of the binary blob. For instance, there is one from NXP, ST, Amlogic and some more.

Here is a brief look at the differences:

nxp/gc_feature_database.h (autogenerated at 2023-10-24 16:06:00, 861 struct members, 27 entries)
stm/gc_feature_database.h (autogenerated at 2022-12-29 11:13:00, 833 struct members, 4 entries)
amlogic/gc_feature_database.h (autogenerated at 2021-04-12 17:20:00, 733 struct members, 8 entries)

We understand that these header files are generated and adhere to a specific structure. Therefore, all we need to do is write an intelligent Python script capable of merging the struct members into a single consolidated struct. This script will also convert the old struct entries to the new format and generate a header file that we can use.

I’m consistently amazed by how swiftly and effortlessly Python can be used for such tasks. Ninety-nine percent of the time, there’s a ready-to-use Python module available, complete with examples and some documentation. To address the C header parsing challenge, I opted for pycparser.

The final outcome is a generated hwdb.h file that looks and feels similar to those generated from the binary blob.

Future proof

This header merging approach offers several advantages:

  • It simplifies the support for another SoC vendor.
  • There’s no need to comprehend the significance of each feature bit.
  • The source header files are supplied by Versilicon or the SoC vendor, ensuring accuracy.
  • Updating the hwdb is straightforward — simply replace the files and rebuild Mesa.
  • It allows for much quicker deployment of new features and hwdb updates since no kernel update is required.
  • This method accelerates the development of user space drivers.

While working on this topic I decided to do a bigger refactoring with the end goal to provide a struct etna_core_info that is located outside of the gallium driver.

This makes the code future proof and moves the filling of struct etna_core_info directly into the lowest layer - libetnaviv_drm (src/etnaviv/drm).

We have not yet talked about one important detail.

What happens if there is no entry in the user space hwdb?

The solution is straightforward: we fallback to the previous method and request all feature words from the kernel driver. However, in an ideal scenario, our user space hardware database should supply all necessary entries. If you find that an entry for your GPU/NPU is missing, please get in touch with me.

What about the in-kernel hwdb?

The existing system, despite its limitations, is set to remain indefinitely, with new entries being added to accommodate new GPUs. Although it will never contain as much information as the user space counterpart, this isn’t necessarily a drawback. For the purposes at hand, only a handful of feature bits are required.