Igalia on Christian Gmeiner

PanVK Extension Sprint: Mesa 26.1

Mon, 20 Apr 2026 00:00:00 +0000

Last week marks the Mesa 26.1 branch point, and I wanted to take a moment to look back at what happened on the PanVK front.

Spoiler: it was a busy one.

The landscape

PanVK - the Vulkan driver for Arm Mali GPUs (Valhall and newer) - is a collaborative effort. Collabora has been doing incredible work on the compiler backend and the foundational infrastructure. Arm themselves are actively contributing to the open source Mali GPU stack as well, reviewing patches and pushing driver quality forward. On the Igalia side, my focus this cycle was Vulkan extension coverage. The kind of work that doesn’t make for flashy demos but is absolutely critical for real-world application compatibility - especially for things like DXVK.

Why extensions matter

A Vulkan driver without extensions is like a car without wheels - technically complete, practically useless. Applications (and translation layers like DXVK, vkd3d-proton, and Zink) probe for specific extensions and adjust their behavior accordingly. Missing even one can mean falling back to a slower path or refusing to run entirely.

Three different things drove the extension work this cycle:

The Proton stack - extensions consumed by DXVK and vkd3d-proton, the translation layers that make D3D9–12 games run on Vulkan.
DDK feature parity - extensions Arm’s binary Mali driver exposes that PanVK didn’t yet, tracked in the DDK feature parity ticket.
Catching up on mesamatrix.net - closing the visible gap with the other Mesa Vulkan drivers (RADV, ANV, Turnip).

So I set out to close gaps. Lots of them.

The Proton stack essentials

These are extensions DXVK and vkd3d-proton actually require - not just nice-to-haves on a recommendation list. Each one unblocks something concrete in the D3D-to-Vulkan translation path.

VK_EXT_conditional_rendering (!40452) was probably the most involved piece of work. D3D12 has predicated rendering (SetPredication), and vkd3d-proton uses this extension to implement it efficiently. It wasn’t a simple “flip a bit” situation - I had to add the core state tracking, wrap all draw and dispatch calls with conditional checks, handle inherited state in secondary command buffers, and make sure meta operations (like internal clears and resolves) properly disable conditional rendering so they don’t get accidentally skipped. That ended up being five patches touching draw paths, dispatch, and the secondary command buffer inheritance logic.

VK_VALVE_mutable_descriptor_type (!40254) is one of those extensions that exists purely because Valve needed it. In D3D, descriptor types are more fluid than in Vulkan - a descriptor slot might hold a sampler one frame and a storage buffer the next. vkd3d-proton enables this to avoid expensive descriptor set re-creation when types change. It’s a trivial alias of the already-supported VK_EXT_mutable_descriptor_type, so enabling it was a one-liner.

VK_EXT_memory_budget (!40246) lets applications (and both DXVK and vkd3d-proton) query how much GPU memory is actually available versus how much is in use. Without it, apps are flying blind on memory management, which can lead to over-allocation and stuttering. Getting the heap budget reporting right required hooking into the kernel memory accounting. (LF maybe change this to “required hooking into the kernel driver’s memory accounting function”)

VK_EXT_attachment_feedback_loop_layout (!40498) - feedback loops let you read from an attachment that you’re simultaneously rendering to (think screen-space effects that sample the current framebuffer). DXVK uses this in its D3D9 hazard layout path to avoid artifacts in certain games.

VK_EXT_shader_stencil_export (!39944) - allows fragment shaders to write stencil values directly, rather than relying on the fixed-function stencil path. DXVK leans on this in its meta-copy and meta-resolve paths, and vkd3d-proton enables it too. The Panfrost stack already supported everything needed; literally a one-line advertisement in physical_device.c.

VK_KHR_shader_untyped_pointers (!40457, v9+) - a newer KHR extension that relaxes pointer type requirements in SPIR-V. DXVK calls this out as a dependency for descriptor heaps. Restricted to v9+ because Bifrost has issues with 8-bit vector loads through untyped pointers combined with 16-bit storage. Also needed to lower memcpy derefs before explicit IO lowering.

Catching up to the DDK

The panfrost keeps a DDK feature parity ticket tracking everything Arm’s binary Mali driver exposes that PanVK doesn’t yet. Four of those got crossed off this cycle:

VK_ARM_scheduling_controls (!40063, CSF only) - an ARM-specific extension for controlling shader core scheduling on Command Stream Frontend (CSF) hardware. I also fixed the per-queue shader core count so CSF group creation uses the right values.
VK_EXT_legacy_dithering (!39781) - implements ordered dithering in the blending stage, which some applications expect from legacy APIs. Wired up the existing Panfrost dithering infrastructure (pan_dithered_format_from_pipe_format()) — just plumbing the VK_RENDERING_ENABLE_LEGACY_DITHERING_BIT_EXT flag through the blend descriptor and color attachment internal conversion paths.
VK_EXT_rgba10x6_formats (!40653) - a last-minute addition that just squeezed in before the branch point. This required adding the PIPE_FORMAT_X6R10X6G10X6B10X6A10_UNORM format to Mesa’s gallium format table first, then wiring it up in PanVK. Used for 10-bit per channel content in video and HDR scenarios.
VK_EXT_astc_decode_mode (!39799) - controls the format used when decoding ASTC compressed textures, allowing apps to choose lower-precision decoding for performance. The Panfrost hardware already supports controlling ASTC decode precision via the Decode Wide plane descriptor field; just needed to parse VkImageViewASTCDecodeModeEXT from the image view pNext chain and set astc.narrow accordingly. v9+ only because the relevant ASTC plane descriptor fields only exist from Valhall onward.

Catching up on mesamatrix

mesamatrix.net tracks Vulkan extension support across the Mesa drivers. The remaining extensions this cycle were about closing the visible gap with RADV, ANV, and Turnip — extensions that don’t have a single big consumer driving them, but whose absence shows up as red squares on the matrix and as silent fallbacks in apps that probe for them.

VK_EXT_color_write_enable (!39913) - per-attachment control over which color channels actually get written. The common Vulkan runtime already handled all the pipeline state and dynamic command plumbing, and panvk’s blend descriptor emission was already consuming color_write_enables, so this was effectively an “advertise the feature” change.
VK_EXT_depth_clamp_control (!39925) - lets applications specify a custom depth clamp range instead of always clamping to the viewport’s minDepth/maxDepth. Mali GPUs have native LOW_DEPTH_CLAMP/HIGH_DEPTH_CLAMP registers, so it was a matter of wiring the existing runtime state through to those.
VK_EXT_attachment_feedback_loop_dynamic_state (!40498) - the dynamic-state companion to VK_EXT_attachment_feedback_loop_layout above; lets you toggle feedback-loop state per draw call without pipeline rebuilds.
VK_EXT_map_memory_placed (!40315) - lets applications control where in their virtual address space GPU memory gets mapped. This simplified pan_kmod_bo_mmap() to always map the whole BO, cleaning up the kernel module interface.
VK_EXT_shader_atomic_float (!40506) - atomic operations on float values in shaders. The existing axchg instruction is type-agnostic, so no compiler changes were needed; image atomics are already lowered to global atomics. Just had to add R32_FLOAT to the storage-image-atomic format flag.
VK_EXT_nested_command_buffer (!40120, v10+) - allows secondary command buffers to call other secondary command buffers. The CSF backend’s cs_call() is a hardware call/return instruction that nests naturally, and the existing CmdExecuteCommands already does the caller/callee state merging. The 8-level hardware call stack, minus one for the kernel ringbuffer call and two reserved for future driver use, leaves maxCommandBufferNestingLevel at 5.
VK_EXT_image_view_min_lod (!39938) - allows clamping the minimum LOD at the image view level rather than just the sampler. Mali v6+ has per-texture-descriptor LOD clamp fields independent from the sampler’s, so this just plumbs vk_image_view::min_lod through pan_image_view into the texture descriptor — no shader lowering or descriptor merging needed.
VK_EXT_zero_initialize_device_memory (!39658) - guarantees that newly allocated device memory is zeroed. The kernel side already does the heavy lifting — panfrost/panthor use drm_gem_shmem, which serves zeroed pages from the shmem subsystem. And since panvk treats layout transitions as no-ops, VK_IMAGE_LAYOUT_ZERO_INITIALIZED_EXT falls out for free. (Did need one format-table fix: dropping STORAGE_IMAGE support from compressed formats to avoid crashes in the new dEQP tests.)

By the numbers

That’s 18 extensions across roughly a dozen merge requests - ranging from single-patch additions to multi-patch series like conditional rendering. Collectively they represent a meaningful shift in what PanVK can claim to support: more of the Proton stack working out of the box, four more checkboxes against the DDK, and fewer red squares on the mesamatrix.

What’s next

The extension sprint isn’t over - there are still gaps to fill, and each one removed makes PanVK more viable for real workloads. But 26.1 was a good milestone. The driver is getting to the point where you can throw a DXVK game at it and have a reasonable expectation that it just works.

Back to it. ⚡

GLES3 on etnaviv: Fixing the Hard Parts

Fri, 20 Feb 2026 00:00:00 +0000

This is the start of a series about getting OpenGL ES 3.0 conformance on Vivante GC7000 hardware using the open-source etnaviv driver in Mesa. Thanks to Igalia for giving me the opportunity to spend some time on these topics.

Where We Are

etnaviv has supported GLES2 on Vivante GPUs for a long time. GLES3 support has been progressing steadily, but the remaining dEQP failures are the stubborn ones - the cases where the hardware doesn’t quite do what the spec says, and the driver has to get creative.

These aren’t missing feature bits or unimplemented extensions. These are the problems where you stare at a command stream trace from the proprietary blob driver and realize they’re doing something completely different from what you’d expect, because the hardware has a quirk that nobody documented.

The Approach

My workflow for each fix follows a pattern:

Run the failing dEQP test, note the failure mode (wrong pixels, crash, GPU hang)
Capture command stream traces from both the blob (proprietary driver) and etnaviv for the same test
Compare the traces - what states differ? What draw calls differ? Is the blob doing extra work?
Understand why - read the spec, reason about the hardware behavior, figure out what the blob knows that we don’t
Implement the fix in Mesa, test, iterate

The blob traces are invaluable. Vivante’s proprietary driver has years of hardware workarounds baked in. When something doesn’t work, the answer is usually hiding in the trace.

The Hardware

The primary test target is a GC7000 rev 6214 (HALTI5 generation). This is a capable GPU found in the NXP i.MX8MQ SoC. It has a BLT engine, texture descriptors, and most of the features needed for GLES3 - but also its own set of rasterization quirks and interpolation behaviors that need workarounds.

In the future, I plan to expand the focus to the broader GC7000 GPU family.

Up Next

The first post will tackle the R/B swap problem - the PE always writes pixels in BGRA byte order, and we’ve been fixing it in the shader. The blob has a simpler answer. Stay tuned.

Following Along

All of this work happens upstream in Mesa.

If you’re interested in GPU driver development, these posts aim to show what the work actually looks like – not just the final patch, but the debugging, the trace analysis, and the reasoning that gets you there.

My first Vulkan extension

Fri, 13 Feb 2026 00:00:00 +0000

After years of working on etnaviv - a Gallium/OpenGL driver for Vivante GPUs - I’ve been wanting to get into Vulkan. As part of my work at Igalia, the goal was to bring VK_EXT_blend_operation_advanced to lavapipe. But rather than going straight there, I started with Honeykrisp - the Vulkan driver for Apple Silicon - as a first target: a real hardware driver to validate the implementation against before wiring it up in a software renderer. My first Vulkan extension, and my first real contribution to Honeykrisp.

Why this extension?

A customer needed advanced blending support in lavapipe, so the extension choice was made for me. But it turned out to be a great fit for a first extension - useful, self-contained, and not a multi-month rabbit hole. Standard Vulkan blending is limited to basic operations like add and subtract with blend factors. That’s fine for most rendering, but if you want Photoshop-style effects - multiply, screen, overlay, color dodge, color burn - you’re stuck doing it manually in shaders or with extra render passes.

The extension adds 19 blend operations that handle all of this in the fixed-function pipeline. Useful for UI toolkits, image editors, and anywhere you need creative compositing.

The journey

What started as “just wire up an extension” turned into a proper refactoring adventure. The existing blend infrastructure in Mesa was scattered - OpenGL had its own enum definitions, Vulkan had separate conversions, and the actual NIR blend math lived in glsl-specific code.

So I took a step back and cleaned things up. Moved the blend mode enums into a shared util/blend.h header. Added proper helpers in the Vulkan runtime for converting between API types. Then came the fun part: implementing the actual blend equations in nir/lower_blend.

Each of those 19 blend modes has its own formula from the spec. Some are simple (multiply is just src * dst), others get hairy with conditionals and special cases for luminosity and saturation. About 570 lines of NIR code later, I had a lowering pass that any Mesa driver can use.

For example, here’s how the spec defines advanced blending. Each mode plugs into a general equation:

RGB = f(Cs,Cd) * X * p0 + Cs * Y * p1 + Cd * Z * p2
A   =            X * p0 +      Y * p1 +      Z * p2

Where p0, p1, p2 are weighting factors based on source/destination alpha coverage, and f(Cs,Cd) is the per-mode blend function. For overlay - probably the most recognizable blend mode from Photoshop - the spec defines:

f(Cs,Cd) = 2*Cs*Cd,              if Cd <= 0.5
           1 - 2*(1-Cs)*(1-Cd),  otherwise

And here’s that same formula expressed as NIR - Mesa’s intermediate representation:

static inline nir_def *
blend_overlay(nir_builder *b, nir_def *src, nir_def *dst)
{
   /* f(Cs,Cd) = 2*Cs*Cd, if Cd <= 0.5
    *            1-2*(1-Cs)*(1-Cd), otherwise
    */
   nir_def *rule_1 = nir_fmul(b, nir_fmul(b, src, dst), imm3(b, 2.0));
   nir_def *rule_2 =
      nir_fsub(b, imm3(b, 1.0),
               nir_fmul(b, nir_fmul(b, nir_fsub(b, imm3(b, 1.0), src),
                                        nir_fsub(b, imm3(b, 1.0), dst)),
                         imm3(b, 2.0)));
   return nir_bcsel(b, nir_fge(b, imm3(b, 0.5f), dst), rule_1, rule_2);
}

nir_fmul, nir_fsub, nir_bcsel - multiply, subtract, conditional select. Each call builds a node in the shader’s IR graph. This is what “lowering” looks like: translating a high-level blend mode into operations the GPU’s shader core can execute. The outer framework - the p0/p1/p2 weighting - is handled by the caller; each blend function just implements its f(Cs,Cd).

The Turnip surprise

Everything was working on Honeykrisp, tests were passing, life was good. Then the merge pipeline started failing - on Turnip (the Adreno Vulkan driver). Not my code, not my hardware, but my changes were breaking it.

I reached out for help, and Zan Dobersek stepped up. After some digging, he found the culprit: I was violating a subtle corner of the spec around attachmentCount. Turns out, when certain dynamic states are set and advancedBlendCoherentOperations isn’t enabled, attachmentCount gets ignored entirely. My state tracking code wasn’t accounting for that.

One fixup commit later, Turnip was happy again. This is the part they don’t tell you about Vulkan extensions - you’re not just implementing for your driver, you’re touching shared infrastructure that every driver depends on. And the Mesa CI will absolutely let you know if you break something.

lavapipe landed

One week later, lavapipe has it too. This was the original goal, and the shared infrastructure did exactly what it was supposed to - the lavapipe MR is mostly just flipping the extension on. The lowering pass, the enum plumbing, the runtime helpers - all reused as-is. The full dEQP-VK.pipeline.*.blend_operation_advanced.* test suite passes on both drivers.

Two drivers in two weeks. That’s what building the right abstractions gets you.

What’s next

The shared NIR lowering pass is there for any Mesa Vulkan driver to use. If your hardware doesn’t have native advanced blending support, enabling the extension is now mostly plumbing. I’m curious to see if other drivers pick it up.

For me, this was a good first step into Vulkan - and into working on Honeykrisp. I’m looking forward to what comes next.

The Honeykrisp/NIR MR and the lavapipe MR are both merged if you want to look at the code. Thanks to Alyssa Rosenzweig for the review and guidance, and to Zan Dobersek for debugging the Turnip regression with me.