Centricular

Expertise, Straight from the Source



« Back

Devlog

Posts tagged with #centricular

For a project recently it was necessary to collect video frames of multiple streams during a specific interval, and in the future also audio, to pass it through an inference framework for extracting additional metadata from the media and attaching it to the frames.

While GStreamer has gained quite a bit of infrastructure in the past years for machine learning use-cases in the analytics library, there was nothing for this specific use-case yet.

As part of solving this, I proposed as design for a generic interface that allows combining and batching multiple streams into a single one by using empty buffers with a GstMeta that contains the buffers of the original streams, and caps that include the caps of the original streams and allow format negotiation in the pipeline to work as usual.

While this covers my specific use case of combining multiple streams, it should be generic enough to also handle other cases that came up during the discussions.

In addition I wrote two new elements, analyticscombiner and analyticssplitter, that make use of this new API for combining and batching multiple streams in a generic, media-agnostic way over specific time intervals, and later splitting it out again into the original streams. The combiner can be configured to collect all media in the time interval, or only the first or last.

Conceptually the combiner element is similar to NVIDIA's DeepStream nvstreammux element, and in the future it should be possible to write a translation layer between the GStreamer analytics library and DeepStream.

The basic idea for the usage of these elements is to have a pipeline like

-- stream 1 --\                                                                  / -- stream 1 with metadata --
               -- analyticscombiner -- inference elements -- analyticssplitter --
-- stream 2 --/                                                                  \ -- stream 2 with metadata --
   ........                                                                           ......................
-- stream N -/                                                                     \- stream N with metadata --

The inference elements would only add additional metadata to each of the buffers, which can then be made use of further downstream in the pipeline for operations like overlays or blurring specific areas of the frames.

In the future there are likely going to be more batching elements for specific stream types, operating on multiple or a single stream, or making use of completely different batching strategies.

Special thanks also to Olivier and Daniel who provided very useful feedback during the review of the two merge requests.



With GStreamer 1.26, a new D3D12 backend GstD3D12 public library was introduced in gst-plugins-bad.

Now, with the new gstreamer-d3d12 rust crate, Rust can finally access the Windows-native GPU feature written in GStreamer in a safe and idiomatic way.

What You Get with GStreamer D3D12 Support in Rust

  • Pass D3D12 textures created by your Rust application directly into GStreamer pipelines without data copying
  • Likewise, GStreamer-generated GPU resources (such as frames decoded by D3D12 decoders) can be accessed directly in your Rust app
  • GstD3D12 base GStreamer element can be written in Rust

Beyond Pipelines: General D3D12 Utility Layer

GstD3D12 is not limited to multimedia pipelines. It also acts as a convenient D3D12 runtime utility, providing:

  • GPU resource pooling such as command allocator and descriptor heap, to reduce overhead and improve reuse
  • Abstractions for creating and recycling GPU textures with consistent lifetime tracking
  • Command queue and fence management helpers, greatly simplifying GPU/CPU sync
  • A foundation for building custom GPU workflows in Rust, with or without the full GStreamer pipeline


As part of the GStreamer Hackfest in Nice, France I had some time to go through some outstanding GStreamer issues. One such issue that has been on my mind was this GStreamer OpenGL Wayland issue.

Now, the issue is that OpenGL is an old API and did not have some of the platform extensions it does today. As a result, most windowing system APIs allow creating an output surface (or a window) but never showing it. This also works just fine when you are creating an OpenGL context but not actually rendering anything to the screen and this approach is what is used by all of the other major OpenGL platforms (Windows, macOS, X11, etc) supported by GStreamer.

When wayland initially arrived, this was not the case. A wayland surface could be the back buffer (an OpenGL term for rendering to a surface) but could not be hidden. This is very different from how other windowing APIs worked at the time. As a result, the initial implementation using Wayland within GStreamer OpenGL used some heuristics for determining when a wayland surface would be created and used that basically boiled down to, if there is no shared OpenGL context, then create a window.

This heuristic obviously breaks in multiple different ways, the two most obvious being:

  1. gltestsrc ! gldownload ! some-non-gl-sink - there should be no surface used here.
  2. gltestsrc ! glimagesink gltestsrc ! glimagesink - there should be two output surfaces used here.

The good news is that issue is now fixed by adding some API that glimagesink can use to notify that it would like an output surface. This has been implemented in this merge request and will be part of GStreamer 1.28.



JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

A few months ago we added support for JPEG XS encoding and decoding in GStreamer, alongside MPEG-TS container support.

This initially covered progressive scan only though.

Unfortunately interlaced scan, which harks back to the days when TVs had cathode ray tube displays, is still quite common, especially in the broadcasting industry, so it was only a matter of time until support for that would be needed as well.

Long story short, GStreamer can now (with this pending Merge Request) also encode and decode interlaced video into/from JPEG XS.

When putting JPEG XS into MPEG-TS, the individual fields are actually coded separately, so there are two JPEG XS code streams per frame. Inside GStreamer pipelines interlaced raw video can be carried in multiple ways, but the most common one is an "interleaved" image, where the two fields are interleaved row by row, and this is also what capture cards such as AJA or Decklink Blackmagic produce in GStreamer.

When encoding interlaced video in this representation, we need to go twice over each frame and feed every second row of pixels to the underlying SVT JPEG XS encoder which itself is not aware of the interlaced nature of the video content. We do this by specifying double the usual stride as rowstride. This works fine, but unearthed some minor issues with the size checks on the codec side, for which we filed a pull request.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



Some time ago, Edward and I wrote a new element that allows clocking a GStreamer pipeline from an MPEG-TS stream, for example received via SRT.

This new element, mpegtslivesrc, wraps around any existing live source element, e.g. udpsrc or srtsrc, and provides a GStreamer clock that approximates the sender's clock. By making use of this clock as pipeline clock, it is possible to run the whole pipeline at the same speed as the sender is producing the stream and without having to implement any kind of clock drift mechanism like skewing or resampling. Without this it is necessary currently to adjust the timestamps of media coming out of GStreamer's tsdemux element, which is problematic if accurate timestamps are necessary or the stream should be stored to a file, e.g. a 25fps stream wouldn't have exactly 40ms inter-frame timestamp differences anymore.

The clock is approximated by making use of the in-stream MPEG-TS PCR, which basically gives the sender's clock time at specific points inside the stream, and correlating that together with the local receive times via a linear regression to calculate the relative rate between the sender's clock and the local system clock.

Usage of the element is as simple as

$ gst-launch-1.0 mpegtslivesrc source='srtsrc location=srt://1.2.3.4:5678?latency=150&mode=caller' ! tsdemux skew-corrections=false ! ...
$ gst-launch-1.0 mpegtslivesrc source='udpsrc address=1.2.3.4 port=5678' ! tsdemux skew-corrections=false ! ...

Addition 2025-06-28: If you're using an older (< 1.28) version of GStreamer, you'll have to use the ignore-pcr=true property on tsdemux instead. skew-corrections=false was only added recently and allows for more reliable handling of MPEG-TS timestamp discontinuities.

A similar approach for clocking is implemented in the AJA source element and the NDI source element when the clocked timestamp mode is configured.



If you've ever seen a news or sports channel playing without sound in the background of a hotel lobby, bar, or airport, you've probably seen closed captions in action.

These TV-style captions are alphabet/character-based, with some very basic commands to control the positioning and layout of the text on the screen.

They are very low bitrate and were transmitted in the invisible part of TV images during the vertical blanking interval (VBI) back in those good old analogue days ("line 21 captions").

Nowadays they are usually carried as part of the MPEG-2 or H.264/H.265 video bitstream, unlike say text subtitles in a Matroska file which will be its own separate stream in the container.

In GStreamer closed captions can be carried in different ways: Either implicitly as part of a video bitstream, or explicitly as part of a video bitstream with video caption metas on the buffers passing through the pipeline. Captions can also travel through a pipeline stand-alone in form of one of multiple raw caption bitstream formats.

To make handling these different options easier for applications there are elements that can extract captions from the video bitstream into metas, and split off captions from metas into their own stand-alone stream, and to do the reverse and combine and reinject them again.

SMPTE 2038 Ancillary Data

SMPTE 2038 (pdf) is a generic system to put VBI-style ancillary data into an MPEG-TS container. This could include all kinds of metadata such as scoreboard data or game clocks, and of course also closed captions, in this case in form of a distinct stream completely separate from any video bitstream.

We've recently added support for SMPTE 2038 ancillary data in GStreamer. This comes in form of a number of new elements in the GStreamer Rust closedcaption plugin and mappings for it in the MPEG-TS muxer and demuxer.

The new elements are:

  • st2038ancdemux: splits SMPTE ST-2038 ancillary metadata (as received from tsdemux) into separate streams per DID/SDID and line/horizontal_offset. Will add a sometimes pad with details for each ancillary stream. Also has an always source pad that just outputs all ancillary streams for easy forwarding or remuxing, in case none of the ancillary streams need to be modified or dropped.

  • st2038ancmux: muxes SMPTE ST-2038 ancillary metadata streams into a single stream for muxing into MPEG-TS with mpegtsmux. Combines ancillary data on the same line if needed, as is required for MPEG-TS muxing. Can accept individual ancillary metadata streams as inputs and/or the combined stream from st2038ancdemux.

    If the video framerate is known, it can be signalled to the ancillary data muxer via the output caps by adding a capsfilter behind it, with e.g. meta/x-st-2038,framerate=30/1.

    This allows the muxer to bundle all packets belonging to the same frame (with the same timestamp), but that is not required. In case there are multiple streams with the same DID/SDID that have an ST-2038 packet for the same frame, it will prioritise the one from more recently created request pads over those from earlier created request pads (which might contain a combined stream for example if that's fed first).

  • st2038anctocc: extracts closed captions (CEA-608 and/or CEA-708) from SMPTE ST-2038 ancillary metadata streams and outputs them on the respective sometimes source pad (src_cea608 or src_cea708). The data is output as a closed caption stream with caps closedcaption/x-cea-608,format=s334-1a or closedcaption/x-cea-708,format=cdp for further processing by other GStreamer closed caption processing elements.

  • cctost2038anc: takes closed captions (CEA-608 and/or CEA-708) as produced by other GStreamer closed caption processing elements and converts them into SMPTE ST-2038 ancillary data that can be fed to st2038ancmux and then to mpegtsmux for splicing/muxing into an MPEG-TS container. The line-number and horizontal-offset properties should be set to the desired line number and horizontal offset.

Please give it a spin and let us know how it goes!



What is JPEG XS?

JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

It's wavelet based, with low computational overhead and a latency measured in scanlines, and it is designed to allow easy implementation in software, GPU or FPGAs.

Multi-generation robustness means repeated decoding and encoding will not introduce unpleasant coding artefacts or noticeably degrade image quality, which makes it suitable for video production workflows.

It is often deployed in lieu of existing raw video workflows, where it allows sending multiple streams over links designed to carry a single raw video transport.

JPEG XS encoding / decoding in GStreamer

GStreamer now gained basic support for this codec.

Encoding and decoding is supported via the Open Source Intel Scalable Video Technology JPEG XS library, but third-party GStreamer plugins that provide GPU accelerated encoding and decoding exist as well.

MPEG-TS container mapping

Support was also added for carriage inside MPEG-TS which should enable a wide range of streaming applications including those based on the Video Services Forum (VSF)'s Technical Recommendation TR-07.

JPEG XS caps in GStreamer

It actually took us a few iterations to come up with GStreamer caps that we were somewhat happy with for starters.

Our starting point was what the SVT encoder/decoder output/consume, and our initial target was MPEG-TS container format support.

We checked various specifications to see how JPEG XS is mapped there and how it could work, in particular:

  • ISO/IEC 21122-3 (Part 3: Transport and container formats)
  • MPEG-TS JPEG XS mapping and VSF TR-07 - Transport of JPEG XS Video in MPEG-2 Transport Stream over IP
  • RFC 9134: RTP Payload Format for ISO/IEC 21122 (JPEG XS)
  • SMPTE ST 2124:2020 (Mapping JPEG XS Codestreams into the MXF)
  • MP4 mapping

and we think the current mapping will work for all of those cases.

Basically each mapping wants some extra headers in addition to the codestream data, for the out-of-band signalling required to make sense of the image data. Originally we thought about putting some form of codec_data header into the caps, but it wouldn't really have made anything easier, and would just have duplicated 99% of the info that's in the video caps already anyway.

The current caps mapping is based on ISO/IEC 21122-3, Annex D, with additional metadata in the caps, which should hopefully work just fine for RTP, MP4, MXF and other mappings in future.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



webrtcsink already supported instantiating a data channel for the sole purpose of carrying navigation events from the consumer to the producer, it can also now create a generic control data channel through which the consumer can send JSON requests in the form:

{
    "id": identifier used in the response message,
    "mid": optional media identifier the request applies to,
    "request": {
        "type": currently "navigationEvent" and "customUpstreamEvent" are supported,
        "type-specific-field": ...
    }
}

The producer will reply with such messages:

{
  "id": identifier of the request,
  "error": optional error message, successful if not set
}

The example frontend was also updated with a text area for sending any arbitrary request.

The use case for this work was to make it possible for a consumer to control the mix matrix used for the audio stream, with such a pipeline running on the producer side:

gst-launch-1.0 audiotestsrc ! audioconvert ! webrtcsink enable-control-data-channel=true

As audioconvert now supports setting a mix matrix through a custom upstream event, the consumer can simply input the following text in the request field of the frontend to reverse the channels of a stereo audio stream:

{
  "type": "customUpstreamEvent",
  "structureName": "GstRequestAudioMixMatrix",
  "structure": {
    "matrix": [[0.0, 1.0], [1.0, 0.0]]
  }
}


The default signaller for webrtcsink can now produce an answer when the consumer sends the offer first.

To test this with the example, you can simply follow the usual steps but also paste the following text in the text area before clicking on the producer name:

{
  "offerToReceiveAudio": 1,
  "offerToReceiveVideo": 1
}

I implemented this in order to test multiopus support with webrtcsink, as it seems to work better when munging the SDP offered by chrome.



A couple of weeks ago I implemented support for static HDR10 metadata in the decklinkvideosink and decklinkvideosrc elements for Blackmagic video capture and playout devices. The culmination of this work is available from MR 7124 - decklink: add support for HDR output and input

This adds support for both PQ and HLG HDR alongside some improvements in colorimetry negotiation. Static HDR metadata in GStreamer is conveyed through caps.

The first part of this is the 'colorimetry' field in video/x-raw caps. decklinkvideosink and decklinkvideosrc now support the colorimetry values 'bt601', 'bt709', 'bt2020', 'bt2100-hlg', and 'bt2100-pq' for any resolution. Previously the colorimetry used was fixed based on the resolution of the video frames being sent or received. With some glue code, the colorimetry is now retrieved from the Decklink API and the Decklink API can ask us for the colorimetry of the submitted video frame. Arbitrary colorimetry support is not supported on all Decklink devices and we fallback to the previous fixed list based on frame resolution when not supported.

Support for HDR metadata is a separate feature flag in the Decklink API and may or may not be present independent of Decklink's arbitrary colour space support. If the Decklink device does not support HDR metadata, then the colorimetry values 'bt2100-hlg', and 'bt2100-pq' are not supported.

In the case of HLG, all that is necessary is to provide information that the HLG gamma transfer function is being used. Nothing else is required.

In the case of PQ HDR, in addition to providing Decklink with the correct gamma transfer function, Decklink also needs some other metadata conveyed in the caps in the form of the 'mastering-display-info' and 'light-content-level' fields. With some support from GstVideoMasteringDisplayInfo, and GstVideoContentLightLevel the relevant information signalled to Decklink and can be retrieved from each individual video frame.



In GStreamer 1.20 times I fixed the handling of RTSP control URIs in GStreamer's RTSP source element by making use of GstUri for joining URIs and resolving relative URIs instead of using a wrong, custom implementation of those basic URI operations (see RFC 2396).

This was in response to a bug report which was caused by a regression in 1.18 when fixing that custom implementation some years before. Now that this is handled according to the standards, one would expect that the topic is finally solved.

Unfortunately that was not the case. As it turns out, various RTSP servers are not actually implementing the URI operations for constructing the control URI but instead do simple string concatenation. This works fine for simple cases but once path separators or query parameters are involved this is not sufficient. The fact that both VLC and ffmpeg on the client-side also only do string concatenation unfortunately does not help this situation either as these servers will work fine in VLC and ffmpeg but not in GStreamer, so it initially appears like a GStreamer bug.

To work around these cases automatically, a workaround with a couple of follow-up fixes 1 2 3 4 was implemented. This workaround is available since 1.20.4.

Unfortunately this was also not enough as various servers don't just implement the URI RFC wrong, but also don't implement the RTSP RFC correctly and don't return any kind of meaningful errors but, for example, simply close the connection.

To solve this once and for all, Mathieu now added a new property to rtspsrc that forces it to directly use string concatenation and not attempt proper URI operations first.

$ gst-launch-1.0 rtspsrc location=rtsp://1.2.3.4/test force-non-compliant-url=true ! ...

This property is available since 1.24.7 and should make it possible to use such misbehaving and non-compliant servers.

If GStreamer's rtspsrc fails on an RTSP stream that is handled just fine by VLC and ffmpeg, give this a try.



Last month as part of the GTK 4.14 release, GTK gained support for directly importing DMABufs on Wayland. Among other things, this allows to pass decoded video frames from hardware decoders to GTK, and then under certain circumstances allows GTK to directly forward the DMABuf to the Wayland compositor. And under even more special circumstances, this can then be directly passed to the GPU driver. Matthias wrote some blog posts about the details.

In short, this reduces CPU usage and power consumption considerably when using a suitable hardware decoder and running GTK on Wayland. A suitable hardware decoder in this case is one provided by e.g. Intel or (newer) AMD GPUs via VA but unfortunately not NVIDIA because they simply don't support DMABufs.

I've added support for this to the GStreamer GTK4 video sink, gtk4paintablesink that exists as part of the GStreamer Rust plugins. Previously it was only possible to pass RGB system memory (i.e. after downloading from the GPU in case of hardware decoders) or GL textures (with all kinds of complications) from GStreamer to GTK4.

In general the GTK4 sink now offers the most complete GStreamer / UI toolkit integration, even more than the QML5/6 sinks, and it is used widely by various GNOME applications.



Hello and welcome to our little corner of the internet!

This is where we will post little updates and going-ons about GStreamer, Rust, Meson, Orc, GNOME, librice, and other Free and Open Source Software projects we love to contribute to.

This covers only a small part of our day-to-day upstream activity, but we'll try to make time to post about interesting happenings between the everyday hustle.

Please check in regularly and bear with us while we look into adding more convenient ways to get notified of updates.

In the meantime please follow us on Mastodon, Bluesky, or (yes we still call it) Twitter.