Centricular

Expertise, Straight from the Source



Devlog

Read about our latest work!

New udpsrc2 element

Over the past few years, I have worked on a new GStreamer UDP source element. This is finally merged now and will be part of both the GStreamer 1.30.0 release and the gst-plugins-rs 0.16.0 release.

The old element uses GIO for networking, which is quite inefficient by design. The new implementation uses about 50% less CPU on my machine compared to the old element for a 3 Gbit/s stream.

As can be seen from the docs of the new element, it preserves the API of the old element. As such it should generally be possible to use it as a drop-in replacement.

In addition to performance improvements, the new element also includes various other improvements:

  • Support for faster packet receiving via Generic Receive Offload (GRO) on Linux, and for using recvmmsg() on platforms where it is available to significantly improve receive performance.

  • Complete support for multicast source filtering, including negative filters, and support for platforms that do not have APIs for the IGMPv3 SSM mechanism.

  • Always obtaining kernel-side packet receive times if available, which was opt-in in the old element due to GIO performance issues with socket control messages.

  • New preserve-packetization property that allows outputting multiple packets in the same buffer, which improves performance for formats like MPEG-TS where the UDP packetization is not necessary.

Give it a try with your pipelines and workloads and share your feedback or any issues you encounter.

In the future, io_uring support on Linux could be added for even better receive performance.

SMPTE ST2110 capture

While udpsrc2 is an improvement in general, its primary motivation is better SMPTE ST2110 support in GStreamer. The old element could not handle the packet rates typically used for such streams very well.

ST2110 defines a UDP/RTP-based set of standards for transmitting raw or very-high bitrate audio / video / ancillary data over Ethernet. It is intended as a replacement for SDI.

Related to this, we recently also merged some other improvements:

For all the new depayloaders there are also new, improved implementations of the corresponding payloaders available.

Together, these improvements enable reliable ST2110 stream capture in GStreamer.

An example pipeline putting it all together would look as follows:

$ gst-launch-1.0 \
    \ # Video capture pipeline part
    udpsrc2 address=239.255.64.20 port=16388 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=96, clock-rate=90000, encoding-name=RAW, sampling=YCbCr-4:2:2, depth=10, width=1920, height=1080, exactframerate=60, colorimetry=BT709, pm=2110GPM, ssn=ST2110-20:2017, tp=2110TPN, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-327995485-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpvrawdepay2 ! \
    \
    \ # Ancillary data capture pipeline part
    udpsrc2 address=239.255.64.20 port=16386 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=98, clock-rate=90000, encoding-name=SMPTE291, vpid_code=138, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-2672978631-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpsmpte291depay ! combiner.st2038 \
    \
    \ # Combination of video and ancillary data streams and output
    st2038combiner name=combiner start-time-selection=first ! videoconvert ! queue max-size-bytes=0 max-size-time=0 max-size-buffers=3 ! autovideosink \
    \
    \ # Audio capture and output pipeline part
    udpsrc2 address=239.255.64.20 port=16384 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=audio, payload=(int)97, clock-rate=48000, encoding-name=(string)L24, encoding-params=64, a-sendonly="", a-ptime=0.125, a-ts-refclk="ptp\=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct\=0", ssrc-603238248-cname=(string)E055FF0F3D6E4B349F7B786D8B6C837B' \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpL24depay2 ! audioconvert ! autoaudiosink

This pipeline receives a 1080p60 4:2:2 YUV 10-bit video stream, ST291 ancillary data, and a 24-bit 48kHz 64-channel PCM audio stream. The video and ancillary data are combined to a single stream, and then both the combined video-ancillary stream and the audio are output.

rtprecv is used here for translating packet capture timestamps and RTP header timestamps to consistent GStreamer timestamps.

Ancillary data

The pipeline above captures all three streams and merges the ancillary data stream with the video. The ancillary data itself is not processed further.

One way to process the ancillary data further is to extract ST12 timecodes from it and overlay them over the video.

For this, insert the following elements before the video sink:

 ... ! timecodestamper source=ancillary-meta ancillary-meta-locations='8:2000,570:2000' \
     ! videoconvert ! timeoverlay time-mode=time-code \
     ! autovideosink

Here timecodes from ancillary data at positions (8,2000) and (570,2000) would be extracted and converted to GstVideoTimeCodeMeta on the video buffers.

We recently added support for extracting ST12 timecodes from ancillary meta as well.

The positions depend on the video signal standard in use and can be found in the ST12 specifications.



We've been hard at work doing numerous small and large improvements to GStreamer for people who want to target Apple platforms: macOS, iOS, and tvOS.

iOS ARM64 Simulator Support via an XCFramework

With the GStreamer 1.28.0 release, the project now releases an XCFramework for iOS. As expected, this XCFramework supports iOS arm64, iOS Simulator x86_64, and iOS Simulator arm64. The legacy iOS framework that lipo-ed iOS arm64 and iOS Simulator x86_64 is now deprecated, and will be removed in a future release.

You can download the XCFramework from the official download page.

Thanks to Amy for helping me with this!

tvOS Support

As of version 1.28.1, GStreamer officially supports tvOS, and binaries for it are shipped as part of the iOS XCFramework. This means that the GStreamer 1.28.1 iOS XCFramework contains: ios-arm64, ios-arm64_x86_64-simulator, tvos-arm64, tvos-arm64_x86_64-simulator.

Most of the relevant Apple-specific plugins are supported:

  • osxaudio: Audio source/sink, using CoreAudio
  • atdec: Audio decoder, using AudioToolbox
  • atenc: Audio encoder, using AudioToolbox
  • vtdec: Video decoder, using VideoToolbox
  • vtenc: Video encoder, using VideoToolbox
  • glimagesink: Deprecated EAGL video sink
  • vulkansink: Metal-based video sink, using MoltenVK
  • vulkancolorconvert: Metal-accelerated video conversion, using MoltenVK
  • vulkanoverlaycompositor: Metal-accelerated video overlay compositor, using MoltenVK
  • ... more Metal/Vulkan elements

Two elements that use AVCaptureDevice had to be disabled because they need more work to support tvOS:

  • avfvideosrc: Video capture source, using AVFoundation
  • avfdeviceprovider: Video capture device provider, using AVFoundation

Thanks to Remote Studio for sponsoring this work!

Improved support for using Rust plugins on Apple platforms

Linking more than one Rust plugin into your app had been broken on macOS and iOS for some time. The fix for that requires prelinking, which Amy has written about previously, but it couldn't be enabled on macOS due to some LLVM/LLD issues. We had to wait for the fixes to percolate down to a Rust toolchain release. That finally happened in Rust 1.93, but by that time a new problem had cropped up: Xcode 26.

Due to some toolchain changes in Xcode 26, linking Rust plugins was failing on macOS and also on iOS with the legacy framework. After weighing all the options, the best solution was to add -no_compact_unwind to the linker flags on macOS, and direct people to use the XCFramework when using Rust plugins on iOS.

This is now added automatically if you use pkg-config (using CMake or Meson, for example), but if you're using a plain Xcode project, you need to add -no_compact_unwind manually to linker flags in Xcode.

This fix will be available in the upcoming 1.28.3 release.

Many more macOS, iOS, tvOS improvements

Contributors have been hard at work with small and large improvements to the Apple-specific elements in GStreamer. Ranging from AV1 and VP9 decoding support in vtdec to better debug info, bugfixes, memory leak fixes, crash fixes, and much more. The patches are too many to list or even link!



GStreamer has shipped binaries for all the major platforms for many years now: Windows, Android, macOS, iOS. Linux packages are, of course, handled by all the various distros.

However, if you wanted to use the Python bindings on macOS or Windows, you had to jump through hoops. Till now. GStreamer 1.28.0 ships Python wheels supporting Python 3.9, 3.10, 3.11, 3.12, 3.13, 3.14 on macOS (GIL) and Windows (GIL and free-threading). All you need to do is to run:

python3 -m pip install gstreamer-bundle==1.28.0

And that's it! You will have a complete GStreamer install, with all the plugins you expect on macOS and Windows, and all utilities including gst-launch-1.0 gst-inspect-1.0 gst-device-monitor-1.0 ges-launch-1.0 and so on.

The gstreamer-bundle package is a complete distribution, so it will pull in all the plugins, libraries, cmd-line tools, etc. If you want to depend on a more minimal GStreamer installation or you want to avoid pulling in GPL or known-patent-encumbered ("restricted") plugins, you can use the gstreamer-meta package. That puts plugins behind "extras" like gpl cli restricted gtk4 etc.

Many thanks to Pollen Robotics for sponsoring this work. The Reachy Mini companion robot by Pollen Robotics/Hugging Face uses GStreamer via the Python bindings and is the first production user of these wheels!

We're very excited to see more people make use of these wheels.

Read on for technical details on how all this was accomplished.

Step 1: Ship Python bindings via introspection on macOS and Windows

After many years, Python bindings support was re-introduced in GStreamer 1.26 and was shipped with the installers on macOS and Windows. This required significant work:

Thanks to Amy for doing the bulk of the work here, and to everyone else who contributed towards this over the years: Andoni, Nacho, Thibault, Tjitte, and more that I'm sure I've missed.

Step 2: Build wheels for all supported Python versions

When shipping Python bindings for C libraries, it is necessary to also ship the accompanying libraries and plugins, lest ABI mismatches and incompatibilities arise. That's why the wheels we ship constitute a complete GStreamer distribution, including all plugin dependencies such as GTK4. This means you also have Python bindings for GTK4 available on macOS and Windows.

This wasn't easy to accomplish, especially because PyGObject doesn't use the limited Python C API. That means we can't just build for Python 3.9 and call it a day. We need separate wheels for each Python version × target.

The count goes something like this:

  • We split the gstreamer libraries, plugins, and dependencies across 11 wheels
  • We support 16 Python versions: 3.9 3.10 3.11 3.12 3.13 3.13t 3.14 3.14t
  • And 3 platforms: macOS universal, Windows MSVC x86_64, Windows MSVC x86

That's 11 × 16 × 3 = 528 wheels. That is absolutely untenable!

So we have to do some chicanery to trim that down:

  1. Put everything that links to or loads Python in one wheel called gstreamer_python, so that everything else is agnostic to the Python version being used
  2. Override py_limited_api to be cp39 for all agnostic wheels and mark them as not containing ext modules
  3. Rebuild the recipes responsible for generating libraries or plugins that go into gstreamer_python with each Python version we need to support
  4. On macOS, override plat_name to be macosx_10_13_universal2 for all agnostic wheels even if the Python version we're using doesn't support macOS 10.13, so that they can be reused across all Python versions

That brings us down to 92 wheels. Still quite a lot, but now it's a manageable number!

The long-term solution is to port PyGObject over to the Limited Python C API—which is quite a big undertaking—but should allow us to skip most of this for Python >=3.12.

Thanks to Amy once again for doing most of the work to make this possible, and to Pollen Robotics for sponsoring us to do it. Here are the relevant merge requests:

Step 3: Linux support

You may have noticed that there was no mention of wheels targeting Linux. That's a much harder problem to solve than shipping on macOS or Windows, so we had to punt it for a later release, likely one of the 1.28.x stable releases.

We're planning to target manylinux_2_28 and support Python 3.9+, but there are still unknowns that could throw a spanner in our plans. For instance:

  • GStreamer often utilizes subtle characteristics of the Linux graphics stack for good performance, which may break by targeting such an old base.
  • The difference in library versions shipped with the wheels vs on the system may cause subtle or catastrophic breakage in apps that also load system libraries.

We're hoping that we can overcome all this and ship something that allows users on any Linux distro to get a functional GStreamer just by doing pip install gstreamer-bundle.

In the meantime, please continue to use the distro-provided GStreamer packages and Python bindings, and if they're missing plugins or are too old, please contact your distro maintainer(s).



At the '25 GStreamer conference I gave a talk titled Costly Speech: an introduction.

This was in reference to the fact that all the speech-related elements used in the pipeline I presented were wrappers around for-pay cloud services or for-pay on-site servers.

At the end of the talk, I mentioned that plans for future development included new, "free" backends. The first piece of the puzzle was a Whisper-based transcriber.

I have the pleasure to announce that it is now implemented and published, thank you to Ray Tiley from Tightrope Media Systems for sponsoring this work!

Design / Implementation

The main design goal was for the new transcriber to behave identically to the existing transcribers, in particular:

  • It needed to output timestamped words one at a time
  • It needed to handle live streams with a configurable latency

In order to fulfill that second requirement, the implementation has to feed the model with chunks of a configurable duration.

This approach works well for constraining the latency, but didn't give the best results accuracy-wise, as words close to the chunk boundaries would often go misssing, poorly transcribed or duplicated.

To address this, the implementation uses two mechanisms:

  • It always feeds the previous chunk when running inference for a given chunk
  • It extracts tokens from a sliding window at a configurable distance from the "live edge"

Here's an example with a 4-second chunk duration and a 1 second live edge offset:

0     1     2     3     4     5     6     7     8
| 4-second chunk        | 4-second chunk        |
                  | 4-second token window |

This approach greatly mitigates the boundary issues, as the tokens are always extracted from a "stable" region of the model's output.

With the above settings, the element reports a 5-second latency, to which a configurable processing latency is added. That processing latency is dependent on the hardware, on my machine using CUDA and a NVIDIA RTX 5080 GPU processing time is around 10x real time, which means 1 second processing latency is sufficient.

The obvious drawback of this approach is a doubling of the resource usage as each chunk is fed twice through the inference model, it could be further refined to only feed part of the previous chunk and thus increase performance without sacrificing accuracy.

As the interface of the element follows that of other transcribers, it can be used as an alternative transcriber within transcriberbin.

Future prospects

The biggest missing piece to bring the transcriber to feature parity with other transcribers such as the speechmatics-based one is speaker diarization (~ identification).

Whisper itself does not support diarization. The tinydiarize project aimed to finetune models to address this, but it has unfortunately been put on hold for now, and only supported detecting speaker changes, not identifying individual speakers.

It is not clear at the moment what would be the best open source option to integrate for this task. Models such as NVidia's streaming sortformer are promising, but limited to four speakers for example.

We are very interested in suggestions on this front. Don't hesitate to hit us up if you have any or are interested in sponsoring further improvements to our growing stack of speech-related elements!



Icecast is a Free and Open Source multimedia streaming server, primarily used for audio and radio streaming over HTTP(S).

In GStreamer you can send an audio stream to such a server with the shout2send sink element based on libshout2.

This works perfectly fine, but has one limitation: it does not support the AAC audio codec, which for some use cases and target systems is the preferred audio codec. This is because libshout2 does not support it and will not support it, at least not officially upstream.

Some streaming servers such as the Rocket Streaming Audio Server (RSAS) do support this though, and as such it would be nice to be able to send streams to them in AAC format as well.

Enter icecastsink, which is a new sink element written in Rust to send audio to an Icecast server.

It supports sending AAC audio in addition to Ogg/Vorbis, Ogg/Opus, FLAC and MP3, and also has support for automatic re-connect in case the server kicks off the client, which might happen if the client doesn't send data for a while.

Give it a spin and let us know how it goes!



One of the many items on my "nice-to-have" TODO list has been shipping a GStreamer installer that natively targets Windows ARM64. Cerbero has had support for cross-compiling to Windows ARM64 since GStreamer 1.16 in the form of targeting UWP. However, once that was laid to rest with GStreamer 1.22, we didn't start shipping Windows ARM64 installers instead because it was looking like Microsoft's ARM64 experiment had also failed.

Lately, however, there's been a significant resurgence of ARM64 laptops that run Windows, and they seem to actually have compelling features for some types of users. So I spent a day or two and reinstated support for Windows ARM64 built with MSVC in Cerbero.

My purpose was just to find the shortest path to getting that to a usable state, so a bunch of plugins are missing. In particular all Rust plugins had to be disabled due to an issue building the ring crate. I am optimistic that someone will come along and help fix these issues 😉

You can find the installer at the usual location: https://gstreamer.freedesktop.org/download/#windows

Note that these binaries are cross-compiled from x86_64, so the installer itself is x86, and the contents are missing gobject-introspection and Python bindings. We are also unable to generate Python wheels for Windows ARM64 because of this. If someone would like to help with any of this, please get in touch on the Windows channel in GStreamer's Matrix community.



Currently most code using the GStreamer Analytics library library is written in C or Python. To check how well the API works from Rust, and to have an excuse to play with the Rust burn deep-learning framework, I've implemented an object detection inference element based on the YOLOX model and a corresponding tensor decoder that allows usage with other elements based on the GstAnalytics API. I started this work at the last GStreamer hackfest, but this has now finally been merged and will be part of the GStreamer 1.28.0 release.

burn is a deep-learning framework in Rust that is approximately on the same level of abstraction as PyTorch. It features lots of computation backends (CPU-based, Vulkan, CUDA, ROCm, Metal, libtorch, ...), has loaders (or better: code generation) for e.g. ONNX or PyTorch models, and compiles and optimizes the model for a specific backend. It also comes with a repository containing various example models and links to other community models.

The first element is burn-yoloxinference. It takes raw RGB video frames and passes them through burn; as of the time of this writing either through a CPU-based or a Vulkan-based computation backend. The output then is the very same video frames with the raw object detection results attached as a GstTensorMeta. This is essentially a 85x8400 float matrix, which contains 8400 rows of candidate object detection boxes (4 floats) together with confidence values for the classes (80 floats for the pre-trained models on the COCO classes) and one confidence value for the overall box. The element itself is mostly boilerplate, caps negotiation code and glue code between GStreamer and burn.

The second element is yoloxtensordec. This takes the output of the first element and decodes the GstTensorMeta into a GstAnalyticsRelationMeta, which describes the detected objects with their bounding boxes in an abstract way. As part of this it also implements a non-maximum suppression (NMS) filter using intersection over unions (IoU) of bounding boxes to reduce the 8400 candidate boxes to a much lower number of actual likely object detections. The GstAnalyticsRelationMeta can then be used e.g. by the generic objectdetectionoverlay to render rectangles on top of the video, or the ioutracker elements to track objects over a sequence of frames. Again, this element is mostly boilerplate and caps negotiation code, plus around 100 SLOC of algorithm. In comparison the C YOLOv9 tensor decoder element is about 3x as much code, mostly thanks to the overhead of C memory book-keeping, lack of useful data structures and lack of abstraction language tools.

The reason why the tensor decoder is a separate element is mostly to have one such element per model and to have it implemented independently of the actual implementation and runtime of the model. The same tensor decoder should, for example, also work fine on the output of the onnxinference element with a YOLOX model. From GStreamer 1.28 onwards it will also be possible to autoplug suitable tensor decoders via the tensordecodebin element.

That the tensor decoders are independent of the actual implementation of the model also has the advantage that it can be implemented in a different language, preferably in a safer and less verbose language than C.

For using both elements together and using objectdetectionoverlay to render rectangles around the object detections, the following pipeline can be used:

gst-launch-1.0 souphttpsrc location=https://raw.githubusercontent.com/tracel-ai/models/f4444a90955c1c6fda90597aac95039a393beb5a/squeezenet-burn/samples/cat.jpg \
    ! jpegdec ! videoconvertscale ! "video/x-raw,width=640,height=640" \
    ! burn-yoloxinference model-type=large backend-type=vulkan ! yoloxtensordec label-file=COCO_classes.txt \
    ! videoconvertscale ! objectdetectionoverlay \
    ! videoconvertscale ! imagefreeze ! autovideosink -v

The output should look similar to this image.

I also did a lightning talk about this at the GStreamer conference this year.



When using HTTP Live Streaming (HLS), a common use case is to use MPEG-TS segments or fragmented MP4 fragments. This is done so that the overall stream is available as a sequence of small HTTP-based file downloads, each being one short chunk of an overall bounded or unbounded media stream.

The playlist file (.m3u8) contains a list of these small segments or fragments. This is the standard and most common approach for HLS. For the HLS CMAF case, a multi-segment playlist would look like below.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-TARGETDURATION:5
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="init00000.mp4"
#EXTINF:5,
segment00000.m4s
#EXTINF:5,
segment00001.m4s
#EXTINF:5,
segment00002.m4s

An alternative approach is to use a single media file with the EXT-X-BYTERANGE tag. This method is primarily used for on-demand (VOD) streaming where the complete media file already exists and can reduce the number of files that needs to be managed on the server. Single file with byte-ranges requires the server and client to support HTTP byte range requests and 206 Partial Content responses.

The single media file use case wasn't supported so far with either of hlssink3 or hlscmafsink. A new property single-media-file has been added, which lets users specify the use of a single media file.

hlscmafsink.set_property("single-media-file", "main.mp4");
hlssink3.set_property("single-media-file", "main.ts");

For the HLS CMAF case, this would generate a playlist like below.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-TARGETDURATION:5
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="main.mp4",BYTERANGE="768@0"
#EXT-X-BYTERANGE:100292@768
#EXTINF:5,
main.mp4
#EXT-X-BYTERANGE:98990@101060
#EXTINF:5,
main.mp4
#EXT-X-BYTERANGE:99329@200050
#EXTINF:5,
main.mp4

This can be useful if one has storage requirements where the use of a single media file for HLS might be favourable.