Centricular

Expertise, Straight from the Source


« Back

Devlog

Posts tagged with #gstreamer

As part of the GStreamer Hackfest in Nice, France I had some time to go through some outstanding GStreamer issues. One such issue that has been on my mind was this GStreamer OpenGL Wayland issue.

Now, the issue is that OpenGL is an old API and did not have some of the platform extensions it does today. As a result, most windowing system APIs allow creating an output surface (or a window) but never showing it. This also works just fine when you are creating an OpenGL context but not actually rendering anything to the screen and this approach is what is used by all of the other major OpenGL platforms (Windows, macOS, X11, etc) supported by GStreamer.

When wayland initially arrived, this was not the case. A wayland surface could be the back buffer (an OpenGL term for rendering to a surface) but could not be hidden. This is very different from how other windowing APIs worked at the time. As a result, the initial implementation using Wayland within GStreamer OpenGL used some heuristics for determining when a wayland surface would be created and used that basically boiled down to, if there is no shared OpenGL context, then create a window.

This heuristic obviously breaks in multiple different ways, the two most obvious being:

  1. gltestsrc ! gldownload ! some-non-gl-sink - there should be no surface used here.
  2. gltestsrc ! glimagesink gltestsrc ! glimagesink - there should be two output surfaces used here.

The good news is that issue is now fixed by adding some API that glimagesink can use to notify that it would like an output surface. This has been implemented in this merge request and will be part of GStreamer 1.28.



JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

A few months ago we added support for JPEG XS encoding and decoding in GStreamer, alongside MPEG-TS container support.

This initially covered progressive scan only though.

Unfortunately interlaced scan, which harks back to the days when TVs had cathode ray tube displays, is still quite common, especially in the broadcasting industry, so it was only a matter of time until support for that would be needed as well.

Long story short, GStreamer can now (with this pending Merge Request) also encode and decode interlaced video into/from JPEG XS.

When putting JPEG XS into MPEG-TS, the individual fields are actually coded separately, so there are two JPEG XS code streams per frame. Inside GStreamer pipelines interlaced raw video can be carried in multiple ways, but the most common one is an "interleaved" image, where the two fields are interleaved row by row, and this is also what capture cards such as AJA or Decklink Blackmagic produce in GStreamer.

When encoding interlaced video in this representation, we need to go twice over each frame and feed every second row of pixels to the underlying SVT JPEG XS encoder which itself is not aware of the interlaced nature of the video content. We do this by specifying double the usual stride as rowstride. This works fine, but unearthed some minor issues with the size checks on the codec side, for which we filed a pull request.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



Thanks to the newly added atenc element, you can now use Apple's well-known AAC encoder directly in GStreamer!

gst-launch-1.0 -e audiotestsrc ! audio/x-raw,channels=2,rate=48000 ! atenc ! mp4mux ! filesink location=output.m4a

It supports all the usual rate control modes (CBR/LTA/VBR/CVBR), as well as settings relevant for each of them (target bitrate for CBR, perceived quality for VBR).

For now you can encode AAC-LC with up to 7.1 channels. Support for more AAC profiles and different output formats will be added in the future.

If you need decoding too, atdec is there to help and has supported AAC alongside a few other formats for a long time now.



If you've ever seen a news or sports channel playing without sound in the background of a hotel lobby, bar, or airport, you've probably seen closed captions in action.

These TV-style captions are alphabet/character-based, with some very basic commands to control the positioning and layout of the text on the screen.

They are very low bitrate and were transmitted in the invisible part of TV images during the vertical blanking interval (VBI) back in those good old analogue days ("line 21 captions").

Nowadays they are usually carried as part of the MPEG-2 or H.264/H.265 video bitstream, unlike say text subtitles in a Matroska file which will be its own separate stream in the container.

In GStreamer closed captions can be carried in different ways: Either implicitly as part of a video bitstream, or explicitly as part of a video bitstream with video caption metas on the buffers passing through the pipeline. Captions can also travel through a pipeline stand-alone in form of one of multiple raw caption bitstream formats.

To make handling these different options easier for applications there are elements that can extract captions from the video bitstream into metas, and split off captions from metas into their own stand-alone stream, and to do the reverse and combine and reinject them again.

SMPTE 2038 Ancillary Data

SMPTE 2038 (pdf) is a generic system to put VBI-style ancillary data into an MPEG-TS container. This could include all kinds of metadata such as scoreboard data or game clocks, and of course also closed captions, in this case in form of a distinct stream completely separate from any video bitstream.

We've recently added support for SMPTE 2038 ancillary data in GStreamer. This comes in form of a number of new elements in the GStreamer Rust closedcaption plugin and mappings for it in the MPEG-TS muxer and demuxer.

The new elements are:

  • st2038ancdemux: splits SMPTE ST-2038 ancillary metadata (as received from tsdemux) into separate streams per DID/SDID and line/horizontal_offset. Will add a sometimes pad with details for each ancillary stream. Also has an always source pad that just outputs all ancillary streams for easy forwarding or remuxing, in case none of the ancillary streams need to be modified or dropped.

  • st2038ancmux: muxes SMPTE ST-2038 ancillary metadata streams into a single stream for muxing into MPEG-TS with mpegtsmux. Combines ancillary data on the same line if needed, as is required for MPEG-TS muxing. Can accept individual ancillary metadata streams as inputs and/or the combined stream from st2038ancdemux.

    If the video framerate is known, it can be signalled to the ancillary data muxer via the output caps by adding a capsfilter behind it, with e.g. meta/x-st-2038,framerate=30/1.

    This allows the muxer to bundle all packets belonging to the same frame (with the same timestamp), but that is not required. In case there are multiple streams with the same DID/SDID that have an ST-2038 packet for the same frame, it will prioritise the one from more recently created request pads over those from earlier created request pads (which might contain a combined stream for example if that's fed first).

  • st2038anctocc: extracts closed captions (CEA-608 and/or CEA-708) from SMPTE ST-2038 ancillary metadata streams and outputs them on the respective sometimes source pad (src_cea608 or src_cea708). The data is output as a closed caption stream with caps closedcaption/x-cea-608,format=s334-1a or closedcaption/x-cea-708,format=cdp for further processing by other GStreamer closed caption processing elements.

  • cctost2038anc: takes closed captions (CEA-608 and/or CEA-708) as produced by other GStreamer closed caption processing elements and converts them into SMPTE ST-2038 ancillary data that can be fed to st2038ancmux and then to mpegtsmux for splicing/muxing into an MPEG-TS container. The line-number and horizontal-offset properties should be set to the desired line number and horizontal offset.

Please give it a spin and let us know how it goes!



Up until recently, when using hlscmafsink, if you wanted to move to a new playlist you had to stop the pipeline, modify the relevant properties and then go to PLAYING again.

This was problematic when working with live sources because some data was being lost between the state changes. Not anymore!

A new-playlist signal has been added, which lets you switch output to a new location on the fly, without having any gaps between the content in each playlist.

Simply change the relevant properties first and then emit the signal:

hlscmafsink.set_property("playlist-location", new_playlist_location);
hlscmafsink.set_property("init-location", new_init_location);
hlscmafsink.set_property("location", new_segment_location);
hlscmafsink.emit_by_name::<()>("new-playlist", &[]);

This can be useful if you're capturing a live source and want to switch to a different folder every couple of hours, for example.



What is JPEG XS?

JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

It's wavelet based, with low computational overhead and a latency measured in scanlines, and it is designed to allow easy implementation in software, GPU or FPGAs.

Multi-generation robustness means repeated decoding and encoding will not introduce unpleasant coding artefacts or noticeably degrade image quality, which makes it suitable for video production workflows.

It is often deployed in lieu of existing raw video workflows, where it allows sending multiple streams over links designed to carry a single raw video transport.

JPEG XS encoding / decoding in GStreamer

GStreamer now gained basic support for this codec.

Encoding and decoding is supported via the Open Source Intel Scalable Video Technology JPEG XS library, but third-party GStreamer plugins that provide GPU accelerated encoding and decoding exist as well.

MPEG-TS container mapping

Support was also added for carriage inside MPEG-TS which should enable a wide range of streaming applications including those based on the Video Services Forum (VSF)'s Technical Recommendation TR-07.

JPEG XS caps in GStreamer

It actually took us a few iterations to come up with GStreamer caps that we were somewhat happy with for starters.

Our starting point was what the SVT encoder/decoder output/consume, and our initial target was MPEG-TS container format support.

We checked various specifications to see how JPEG XS is mapped there and how it could work, in particular:

  • ISO/IEC 21122-3 (Part 3: Transport and container formats)
  • MPEG-TS JPEG XS mapping and VSF TR-07 - Transport of JPEG XS Video in MPEG-2 Transport Stream over IP
  • RFC 9134: RTP Payload Format for ISO/IEC 21122 (JPEG XS)
  • SMPTE ST 2124:2020 (Mapping JPEG XS Codestreams into the MXF)
  • MP4 mapping

and we think the current mapping will work for all of those cases.

Basically each mapping wants some extra headers in addition to the codestream data, for the out-of-band signalling required to make sense of the image data. Originally we thought about putting some form of codec_data header into the caps, but it wouldn't really have made anything easier, and would just have duplicated 99% of the info that's in the video caps already anyway.

The current caps mapping is based on ISO/IEC 21122-3, Annex D, with additional metadata in the caps, which should hopefully work just fine for RTP, MP4, MXF and other mappings in future.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



When using hlssink3 and hlscmafsink elements, it's now possible to track new fragments being added by listening for the hls-segment-added message:

Got message #67 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00000.m4s, running-time=(guint64)0, duration=(guint64)3000000000;
Got message #71 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00001.m4s, running-time=(guint64)3000000000, duration=(guint64)3000000000;
Got message #74 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00002.m4s, running-time=(guint64)6000000000, duration=(guint64)3000000000;

This is similar to how you would listen for splitmuxsink-fragment-closed when using the older hlssink2.



webrtcsink already supported instantiating a data channel for the sole purpose of carrying navigation events from the consumer to the producer, it can also now create a generic control data channel through which the consumer can send JSON requests in the form:

{
    "id": identifier used in the response message,
    "mid": optional media identifier the request applies to,
    "request": {
        "type": currently "navigationEvent" and "customUpstreamEvent" are supported,
        "type-specific-field": ...
    }
}

The producer will reply with such messages:

{
  "id": identifier of the request,
  "error": optional error message, successful if not set
}

The example frontend was also updated with a text area for sending any arbitrary request.

The use case for this work was to make it possible for a consumer to control the mix matrix used for the audio stream, with such a pipeline running on the producer side:

gst-launch-1.0 audiotestsrc ! audioconvert ! webrtcsink enable-control-data-channel=true

As audioconvert now supports setting a mix matrix through a custom upstream event, the consumer can simply input the following text in the request field of the frontend to reverse the channels of a stereo audio stream:

{
  "type": "customUpstreamEvent",
  "structureName": "GstRequestAudioMixMatrix",
  "structure": {
    "matrix": [[0.0, 1.0], [1.0, 0.0]]
  }
}


GStreamer's VideoToolbox encoder recently gained support for encoding HEVC/H.265 videos containing an alpha channel.

A separate vtenc_h265a element has been added for this purpose. Assuming you're on macOS, you can use it like this:

gst-launch-1.0 -e videotestsrc ! alpha alpha=0.5 ! videoconvert ! vtenc_h265a ! mp4mux ! filesink location=alpha.mp4

Click here to see an example in action! It should work fine on macOS and iOS, in both Chrome and Safari. On other platforms it might not be displayed at all - compatibility is unfortunately still quite limited.

If your browser supports this format correctly, you will see a moving GStreamer logo on a constantly changing background - something like this. That background is entirely separate from the video and is generated using CSS.



The default signaller for webrtcsink can now produce an answer when the consumer sends the offer first.

To test this with the example, you can simply follow the usual steps but also paste the following text in the text area before clicking on the producer name:

{
  "offerToReceiveAudio": 1,
  "offerToReceiveVideo": 1
}

I implemented this in order to test multiopus support with webrtcsink, as it seems to work better when munging the SDP offered by chrome.



A couple of weeks ago I implemented support for static HDR10 metadata in the decklinkvideosink and decklinkvideosrc elements for Blackmagic video capture and playout devices. The culmination of this work is available from MR 7124 - decklink: add support for HDR output and input

This adds support for both PQ and HLG HDR alongside some improvements in colorimetry negotiation. Static HDR metadata in GStreamer is conveyed through caps.

The first part of this is the 'colorimetry' field in video/x-raw caps. decklinkvideosink and decklinkvideosrc now support the colorimetry values 'bt601', 'bt709', 'bt2020', 'bt2100-hlg', and 'bt2100-pq' for any resolution. Previously the colorimetry used was fixed based on the resolution of the video frames being sent or received. With some glue code, the colorimetry is now retrieved from the Decklink API and the Decklink API can ask us for the colorimetry of the submitted video frame. Arbitrary colorimetry support is not supported on all Decklink devices and we fallback to the previous fixed list based on frame resolution when not supported.

Support for HDR metadata is a separate feature flag in the Decklink API and may or may not be present independent of Decklink's arbitrary colour space support. If the Decklink device does not support HDR metadata, then the colorimetry values 'bt2100-hlg', and 'bt2100-pq' are not supported.

In the case of HLG, all that is necessary is to provide information that the HLG gamma transfer function is being used. Nothing else is required.

In the case of PQ HDR, in addition to providing Decklink with the correct gamma transfer function, Decklink also needs some other metadata conveyed in the caps in the form of the 'mastering-display-info' and 'light-content-level' fields. With some support from GstVideoMasteringDisplayInfo, and GstVideoContentLightLevel the relevant information signalled to Decklink and can be retrieved from each individual video frame.



Hello and welcome to our little corner of the internet!

This is where we will post little updates and going-ons about GStreamer, Rust, Meson, Orc, GNOME, librice, and other Free and Open Source Software projects we love to contribute to.

This covers only a small part of our day-to-day upstream activity, but we'll try to make time to post about interesting happenings between the everyday hustle.

Please check in regularly and bear with us while we look into adding more convenient ways to get notified of updates.

In the meantime please follow us on Mastodon, Bluesky, or (yes we still call it) Twitter.