Centricular

Expertise, Straight from the Source



Devlog

Read about our latest work!

Over the past few years, we've been slowly working on improving the platform-specific plugins for Windows, macOS, iOS, and Android, and making them work as well as the equivalent plugins on Linux. In this episode, we will look at audio device switching in the source and sink elements on macOS and Windows.

On Linux, if you're using the PulseAudio elements (both with the PulseAudio daemon and PipeWire), you get perfect device switching: quick, seamless, easy, and reliable. Simply set the device property whenever you want and you're off to the races. If the device gets unplugged, the pipeline will continue, and you will get notified of the unplug via the GST_MESSAGE_DEVICE_REMOVED bus message from GstDeviceMonitor so you can change the device.

As of a few weeks ago, the Windows Audio plugin wasapi2 implements the same behaviour. All you have to do is set the device property to whatever device you want (fetched using the GstDeviceMonitor API), at any time.

A merge request is open for adding the same feature to the macOS audio plugin, and is expected to be merged soon.

For graceful error handling, such as accidental device unplug or other unexpected errors, there's a new continue-on-error property. Setting that will cause the source to output silence after unplug, whereas the sink will simply discard the buffers. An element warning will be emitted to notify the app (alongside the GST_MESSAGE_DEVICE_REMOVED bus message if there was a hardware unplug), and the app can switch the device by setting the device property.

Thanks to Seungha and Piotr for working on this!



HIP (formerly known as Heterogeneous-computing Interface for Portability) is AMD’s GPU programming API that enables portable, CUDA-like development across both AMD and NVIDIA platforms.

  • On AMD GPUs, HIP runs natively via the ROCm stack.
  • On NVIDIA GPUs, HIP operates as a thin translation layer over the CUDA runtime and driver APIs.

This allows developers to maintain a single codebase that can target multiple GPU vendors with minimal effort.

Where HIP Is Used

HIP has seen adoption in AMD-focused GPU computing workflows, particularly in environments that require CUDA-like programmability. Examples include:

  • PyTorch ROCm backend for deep learning workloads
  • Select scientific applications like LAMMPS and GROMACS have experimented with HIP backends for AMD GPU support
  • GPU-accelerated media processing on systems that leverage AMD hardware

While HIP adoption has been more limited compared to CUDA, its reach continues to expand as support for AMD GPUs grows across a broader range of use cases.

The Challenge: Compile-Time Platform Lock-in

Despite its cross-vendor goal, HIP still has a fundamental constraint at the build level. As of HIP 6.3, HIP requires developers to statically define their target platform at compile time via macros like:

#define __HIP_PLATFORM_AMD__    // for AMD ROCm
#define __HIP_PLATFORM_NVIDIA__ // for CUDA backend

This leads to two key limitations:

  • You must compile separate binaries for AMD and NVIDIA
  • A single binary cannot support both platforms simultaneously
  • HIP does not support runtime backend switching natively

GstHip’s Solution

To overcome this limitation, GstHip uses runtime backend dispatch through:

  • dlopen() on Linux
  • LoadLibrary() on Windows

Instead of statically linking against a single HIP backend, GstHip loads both the ROCm HIP runtime and the CUDA driver/runtime API at runtime. This makes it possible to:

  • Detect available GPUs dynamically
  • Choose the appropriate backend per device
  • Even support simultaneous use of AMD and NVIDIA GPUs in the same process

Unified Wrapper API

GstHip provides a clean wrapper layer that abstracts backend-specific APIs via a consistent naming scheme:

hipError_t HipFooBar(GstHipVendor vendor, ...);

The Hip prefix (capital H) clearly distinguishes the wrapper from native hipFooBar(...) functions. The GstHipVendor enum indicates which backend to target:

  • GST_HIP_VENDOR_AMD
  • GST_HIP_VENDOR_NVIDIA

Internally, each HipFooBar(...) function dispatches to the correct backend by calling either:

  • hipFooBar(...) for AMD ROCm
  • cudaFooBar(...) for NVIDIA CUDA

These symbols are dynamically resolved via dlopen() / LoadLibrary(), enabling runtime backend selection without GPU vendor-specific builds.

Memory Interop

All memory interop in GstHip is handled through the hipupload and hipdownload elements. While zero-copy is not supported due to backend-specific resource management and ownership ambiguity, GstHip provides optimized memory transfers between systems:

  • System Memory ↔ HIP Memory: Utilizes HIP pinned memory to achieve fast upload/download operations between host and device memory
  • GstGL ↔ GstHip: Uses HIP resource interop APIs to perform GPU-to-GPU memory copies between OpenGL and HIP memory
  • GstCUDA ↔ GstHip (on NVIDIA platforms): Since both sides use CUDA memory, direct GPU-to-GPU memory copies are performed using CUDA APIs.

GPU-Accelerated Filter Elements

GstHip includes GPU-accelerated filters optimized for real-time media processing:

  • hipconvertscale/hipconvert/hipscale: Image format conversion and image scaling
  • hipcompositor: composing multiple video streams into single video stream

These filters use the same unified dispatch system and are compatible with both AMD and NVIDIA platforms.

Application Integration Support

As of Merge Request!9340, GstHip exposes public APIs that allow applications to access HIP resources managed by GStreamer. This also enables applications to implement custom GstHIP-based plugins using the same underlying infrastructure without duplicating resource management.

Summary of GstHip Advantages

  • Single plugin/library binary supports both AMD and NVIDIA GPUs
  • Compatible with Linux and Windows
  • Supports multi-GPU systems, including hybrid AMD + NVIDIA configurations
  • Seamless memory interop with System Memory, GstGL, and GstCUDA
  • Provides high-performance GPU filters for video processing
  • Maintains a clean API layer via HipFooBar(...) wrappers, enabling backend-neutral development


For a project recently it was necessary to collect video frames of multiple streams during a specific interval, and in the future also audio, to pass it through an inference framework for extracting additional metadata from the media and attaching it to the frames.

While GStreamer has gained quite a bit of infrastructure in the past years for machine learning use-cases in the analytics library, there was nothing for this specific use-case yet.

As part of solving this, I proposed as design for a generic interface that allows combining and batching multiple streams into a single one by using empty buffers with a GstMeta that contains the buffers of the original streams, and caps that include the caps of the original streams and allow format negotiation in the pipeline to work as usual.

While this covers my specific use case of combining multiple streams, it should be generic enough to also handle other cases that came up during the discussions.

In addition I wrote two new elements, analyticscombiner and analyticssplitter, that make use of this new API for combining and batching multiple streams in a generic, media-agnostic way over specific time intervals, and later splitting it out again into the original streams. The combiner can be configured to collect all media in the time interval, or only the first or last.

Conceptually the combiner element is similar to NVIDIA's DeepStream nvstreammux element, and in the future it should be possible to write a translation layer between the GStreamer analytics library and DeepStream.

The basic idea for the usage of these elements is to have a pipeline like

-- stream 1 --\                                                                  / -- stream 1 with metadata --
               -- analyticscombiner -- inference elements -- analyticssplitter --
-- stream 2 --/                                                                  \ -- stream 2 with metadata --
   ........                                                                           ......................
-- stream N -/                                                                     \- stream N with metadata --

The inference elements would only add additional metadata to each of the buffers, which can then be made use of further downstream in the pipeline for operations like overlays or blurring specific areas of the frames.

In the future there are likely going to be more batching elements for specific stream types, operating on multiple or a single stream, or making use of completely different batching strategies.

Special thanks also to Olivier and Daniel who provided very useful feedback during the review of the two merge requests.



With GStreamer 1.26, a new D3D12 backend GstD3D12 public library was introduced in gst-plugins-bad.

Now, with the new gstreamer-d3d12 rust crate, Rust can finally access the Windows-native GPU feature written in GStreamer in a safe and idiomatic way.

What You Get with GStreamer D3D12 Support in Rust

  • Pass D3D12 textures created by your Rust application directly into GStreamer pipelines without data copying
  • Likewise, GStreamer-generated GPU resources (such as frames decoded by D3D12 decoders) can be accessed directly in your Rust app
  • GstD3D12 base GStreamer element can be written in Rust

Beyond Pipelines: General D3D12 Utility Layer

GstD3D12 is not limited to multimedia pipelines. It also acts as a convenient D3D12 runtime utility, providing:

  • GPU resource pooling such as command allocator and descriptor heap, to reduce overhead and improve reuse
  • Abstractions for creating and recycling GPU textures with consistent lifetime tracking
  • Command queue and fence management helpers, greatly simplifying GPU/CPU sync
  • A foundation for building custom GPU workflows in Rust, with or without the full GStreamer pipeline


As part of the GStreamer Hackfest in Nice, France I had some time to go through some outstanding GStreamer issues. One such issue that has been on my mind was this GStreamer OpenGL Wayland issue.

Now, the issue is that OpenGL is an old API and did not have some of the platform extensions it does today. As a result, most windowing system APIs allow creating an output surface (or a window) but never showing it. This also works just fine when you are creating an OpenGL context but not actually rendering anything to the screen and this approach is what is used by all of the other major OpenGL platforms (Windows, macOS, X11, etc) supported by GStreamer.

When wayland initially arrived, this was not the case. A wayland surface could be the back buffer (an OpenGL term for rendering to a surface) but could not be hidden. This is very different from how other windowing APIs worked at the time. As a result, the initial implementation using Wayland within GStreamer OpenGL used some heuristics for determining when a wayland surface would be created and used that basically boiled down to, if there is no shared OpenGL context, then create a window.

This heuristic obviously breaks in multiple different ways, the two most obvious being:

  1. gltestsrc ! gldownload ! some-non-gl-sink - there should be no surface used here.
  2. gltestsrc ! glimagesink gltestsrc ! glimagesink - there should be two output surfaces used here.

The good news is that issue is now fixed by adding some API that glimagesink can use to notify that it would like an output surface. This has been implemented in this merge request and will be part of GStreamer 1.28.



JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

A few months ago we added support for JPEG XS encoding and decoding in GStreamer, alongside MPEG-TS container support.

This initially covered progressive scan only though.

Unfortunately interlaced scan, which harks back to the days when TVs had cathode ray tube displays, is still quite common, especially in the broadcasting industry, so it was only a matter of time until support for that would be needed as well.

Long story short, GStreamer can now (with this pending Merge Request) also encode and decode interlaced video into/from JPEG XS.

When putting JPEG XS into MPEG-TS, the individual fields are actually coded separately, so there are two JPEG XS code streams per frame. Inside GStreamer pipelines interlaced raw video can be carried in multiple ways, but the most common one is an "interleaved" image, where the two fields are interleaved row by row, and this is also what capture cards such as AJA or Decklink Blackmagic produce in GStreamer.

When encoding interlaced video in this representation, we need to go twice over each frame and feed every second row of pixels to the underlying SVT JPEG XS encoder which itself is not aware of the interlaced nature of the video content. We do this by specifying double the usual stride as rowstride. This works fine, but unearthed some minor issues with the size checks on the codec side, for which we filed a pull request.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



Some time ago, Edward and I wrote a new element that allows clocking a GStreamer pipeline from an MPEG-TS stream, for example received via SRT.

This new element, mpegtslivesrc, wraps around any existing live source element, e.g. udpsrc or srtsrc, and provides a GStreamer clock that approximates the sender's clock. By making use of this clock as pipeline clock, it is possible to run the whole pipeline at the same speed as the sender is producing the stream and without having to implement any kind of clock drift mechanism like skewing or resampling. Without this it is necessary currently to adjust the timestamps of media coming out of GStreamer's tsdemux element, which is problematic if accurate timestamps are necessary or the stream should be stored to a file, e.g. a 25fps stream wouldn't have exactly 40ms inter-frame timestamp differences anymore.

The clock is approximated by making use of the in-stream MPEG-TS PCR, which basically gives the sender's clock time at specific points inside the stream, and correlating that together with the local receive times via a linear regression to calculate the relative rate between the sender's clock and the local system clock.

Usage of the element is as simple as

$ gst-launch-1.0 mpegtslivesrc source='srtsrc location=srt://1.2.3.4:5678?latency=150&mode=caller' ! tsdemux skew-corrections=false ! ...
$ gst-launch-1.0 mpegtslivesrc source='udpsrc address=1.2.3.4 port=5678' ! tsdemux skew-corrections=false ! ...

Addition 2025-06-28: If you're using an older (< 1.28) version of GStreamer, you'll have to use the ignore-pcr=true property on tsdemux instead. skew-corrections=false was only added recently and allows for more reliable handling of MPEG-TS timestamp discontinuities.

A similar approach for clocking is implemented in the AJA source element and the NDI source element when the clocked timestamp mode is configured.



Thanks to the newly added atenc element, you can now use Apple's well-known AAC encoder directly in GStreamer!

gst-launch-1.0 -e audiotestsrc ! audio/x-raw,channels=2,rate=48000 ! atenc ! mp4mux ! filesink location=output.m4a

It supports all the usual rate control modes (CBR/LTA/VBR/CVBR), as well as settings relevant for each of them (target bitrate for CBR, perceived quality for VBR).

For now you can encode AAC-LC with up to 7.1 channels. Support for more AAC profiles and different output formats will be added in the future.

If you need decoding too, atdec is there to help and has supported AAC alongside a few other formats for a long time now.