Centricular

Expertise, Straight from the Source



Devlog

Read about our latest work!

If you've ever seen a news or sports channel playing without sound in the background of a hotel lobby, bar, or airport, you've probably seen closed captions in action.

These TV-style captions are alphabet/character-based, with some very basic commands to control the positioning and layout of the text on the screen.

They are very low bitrate and were transmitted in the invisible part of TV images during the vertical blanking interval (VBI) back in those good old analogue days ("line 21 captions").

Nowadays they are usually carried as part of the MPEG-2 or H.264/H.265 video bitstream, unlike say text subtitles in a Matroska file which will be its own separate stream in the container.

In GStreamer closed captions can be carried in different ways: Either implicitly as part of a video bitstream, or explicitly as part of a video bitstream with video caption metas on the buffers passing through the pipeline. Captions can also travel through a pipeline stand-alone in form of one of multiple raw caption bitstream formats.

To make handling these different options easier for applications there are elements that can extract captions from the video bitstream into metas, and split off captions from metas into their own stand-alone stream, and to do the reverse and combine and reinject them again.

SMPTE 2038 Ancillary Data

SMPTE 2038 (pdf) is a generic system to put VBI-style ancillary data into an MPEG-TS container. This could include all kinds of metadata such as scoreboard data or game clocks, and of course also closed captions, in this case in form of a distinct stream completely separate from any video bitstream.

We've recently added support for SMPTE 2038 ancillary data in GStreamer. This comes in form of a number of new elements in the GStreamer Rust closedcaption plugin and mappings for it in the MPEG-TS muxer and demuxer.

The new elements are:

  • st2038ancdemux: splits SMPTE ST-2038 ancillary metadata (as received from tsdemux) into separate streams per DID/SDID and line/horizontal_offset. Will add a sometimes pad with details for each ancillary stream. Also has an always source pad that just outputs all ancillary streams for easy forwarding or remuxing, in case none of the ancillary streams need to be modified or dropped.

  • st2038ancmux: muxes SMPTE ST-2038 ancillary metadata streams into a single stream for muxing into MPEG-TS with mpegtsmux. Combines ancillary data on the same line if needed, as is required for MPEG-TS muxing. Can accept individual ancillary metadata streams as inputs and/or the combined stream from st2038ancdemux.

    If the video framerate is known, it can be signalled to the ancillary data muxer via the output caps by adding a capsfilter behind it, with e.g. meta/x-st-2038,framerate=30/1.

    This allows the muxer to bundle all packets belonging to the same frame (with the same timestamp), but that is not required. In case there are multiple streams with the same DID/SDID that have an ST-2038 packet for the same frame, it will prioritise the one from more recently created request pads over those from earlier created request pads (which might contain a combined stream for example if that's fed first).

  • st2038anctocc: extracts closed captions (CEA-608 and/or CEA-708) from SMPTE ST-2038 ancillary metadata streams and outputs them on the respective sometimes source pad (src_cea608 or src_cea708). The data is output as a closed caption stream with caps closedcaption/x-cea-608,format=s334-1a or closedcaption/x-cea-708,format=cdp for further processing by other GStreamer closed caption processing elements.

  • cctost2038anc: takes closed captions (CEA-608 and/or CEA-708) as produced by other GStreamer closed caption processing elements and converts them into SMPTE ST-2038 ancillary data that can be fed to st2038ancmux and then to mpegtsmux for splicing/muxing into an MPEG-TS container. The line-number and horizontal-offset properties should be set to the desired line number and horizontal offset.

Please give it a spin and let us know how it goes!



Up until recently, when using hlscmafsink, if you wanted to move to a new playlist you had to stop the pipeline, modify the relevant properties and then go to PLAYING again.

This was problematic when working with live sources because some data was being lost between the state changes. Not anymore!

A new-playlist signal has been added, which lets you switch output to a new location on the fly, without having any gaps between the content in each playlist.

Simply change the relevant properties first and then emit the signal:

hlscmafsink.set_property("playlist-location", new_playlist_location);
hlscmafsink.set_property("init-location", new_init_location);
hlscmafsink.set_property("location", new_segment_location);
hlscmafsink.emit_by_name::<()>("new-playlist", &[]);

This can be useful if you're capturing a live source and want to switch to a different folder every couple of hours, for example.



What is JPEG XS?

JPEG XS is a visually lossless, low-latency, intra-only video codec for video production workflows, standardised in ISO/IEC 21122.

It's wavelet based, with low computational overhead and a latency measured in scanlines, and it is designed to allow easy implementation in software, GPU or FPGAs.

Multi-generation robustness means repeated decoding and encoding will not introduce unpleasant coding artefacts or noticeably degrade image quality, which makes it suitable for video production workflows.

It is often deployed in lieu of existing raw video workflows, where it allows sending multiple streams over links designed to carry a single raw video transport.

JPEG XS encoding / decoding in GStreamer

GStreamer now gained basic support for this codec.

Encoding and decoding is supported via the Open Source Intel Scalable Video Technology JPEG XS library, but third-party GStreamer plugins that provide GPU accelerated encoding and decoding exist as well.

MPEG-TS container mapping

Support was also added for carriage inside MPEG-TS which should enable a wide range of streaming applications including those based on the Video Services Forum (VSF)'s Technical Recommendation TR-07.

JPEG XS caps in GStreamer

It actually took us a few iterations to come up with GStreamer caps that we were somewhat happy with for starters.

Our starting point was what the SVT encoder/decoder output/consume, and our initial target was MPEG-TS container format support.

We checked various specifications to see how JPEG XS is mapped there and how it could work, in particular:

  • ISO/IEC 21122-3 (Part 3: Transport and container formats)
  • MPEG-TS JPEG XS mapping and VSF TR-07 - Transport of JPEG XS Video in MPEG-2 Transport Stream over IP
  • RFC 9134: RTP Payload Format for ISO/IEC 21122 (JPEG XS)
  • SMPTE ST 2124:2020 (Mapping JPEG XS Codestreams into the MXF)
  • MP4 mapping

and we think the current mapping will work for all of those cases.

Basically each mapping wants some extra headers in addition to the codestream data, for the out-of-band signalling required to make sense of the image data. Originally we thought about putting some form of codec_data header into the caps, but it wouldn't really have made anything easier, and would just have duplicated 99% of the info that's in the video caps already anyway.

The current caps mapping is based on ISO/IEC 21122-3, Annex D, with additional metadata in the caps, which should hopefully work just fine for RTP, MP4, MXF and other mappings in future.

Please give it a spin, and let us know if you have any questions or are interested in additional container mappings such as MP4 or MXF, or RTP payloaders / depayloaders.



When using hlssink3 and hlscmafsink elements, it's now possible to track new fragments being added by listening for the hls-segment-added message:

Got message #67 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00000.m4s, running-time=(guint64)0, duration=(guint64)3000000000;
Got message #71 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00001.m4s, running-time=(guint64)3000000000, duration=(guint64)3000000000;
Got message #74 from element "hlscmafsink0" (element): hls-segment-added, location=(string)segment00002.m4s, running-time=(guint64)6000000000, duration=(guint64)3000000000;

This is similar to how you would listen for splitmuxsink-fragment-closed when using the older hlssink2.



webrtcsink already supported instantiating a data channel for the sole purpose of carrying navigation events from the consumer to the producer, it can also now create a generic control data channel through which the consumer can send JSON requests in the form:

{
    "id": identifier used in the response message,
    "mid": optional media identifier the request applies to,
    "request": {
        "type": currently "navigationEvent" and "customUpstreamEvent" are supported,
        "type-specific-field": ...
    }
}

The producer will reply with such messages:

{
  "id": identifier of the request,
  "error": optional error message, successful if not set
}

The example frontend was also updated with a text area for sending any arbitrary request.

The use case for this work was to make it possible for a consumer to control the mix matrix used for the audio stream, with such a pipeline running on the producer side:

gst-launch-1.0 audiotestsrc ! audioconvert ! webrtcsink enable-control-data-channel=true

As audioconvert now supports setting a mix matrix through a custom upstream event, the consumer can simply input the following text in the request field of the frontend to reverse the channels of a stereo audio stream:

{
  "type": "customUpstreamEvent",
  "structureName": "GstRequestAudioMixMatrix",
  "structure": {
    "matrix": [[0.0, 1.0], [1.0, 0.0]]
  }
}


GStreamer's VideoToolbox encoder recently gained support for encoding HEVC/H.265 videos containing an alpha channel.

A separate vtenc_h265a element has been added for this purpose. Assuming you're on macOS, you can use it like this:

gst-launch-1.0 -e videotestsrc ! alpha alpha=0.5 ! videoconvert ! vtenc_h265a ! mp4mux ! filesink location=alpha.mp4

Click here to see an example in action! It should work fine on macOS and iOS, in both Chrome and Safari. On other platforms it might not be displayed at all - compatibility is unfortunately still quite limited.

If your browser supports this format correctly, you will see a moving GStreamer logo on a constantly changing background - something like this. That background is entirely separate from the video and is generated using CSS.



The default signaller for webrtcsink can now produce an answer when the consumer sends the offer first.

To test this with the example, you can simply follow the usual steps but also paste the following text in the text area before clicking on the producer name:

{
  "offerToReceiveAudio": 1,
  "offerToReceiveVideo": 1
}

I implemented this in order to test multiopus support with webrtcsink, as it seems to work better when munging the SDP offered by chrome.