Devlog

Read about our latest work!

New Scalable GStreamer WebRTC implementation

by Matthew Waters on May 21, 2026

TL;DR

There is a new webrtcbin2 rust plugin containing split webrtcsend and webrtcrecv elements for handling a WebRTC session. The highlights of webrtcbin2 are that it uses less threads per session by using rtpsend and rtprecv (also rust), implementing DTLS handling internally, using librice (ICE in rust), and handling signalling all within an async runtime. All of these components also share threads with other instances of webrtcsend and webrtcrecv allowing for an even further reduction in the amount of resources significantly improving scalability.

The landscape

When I originally wrote the webrtcbin GStreamer element almost 10 years ago, I did not completely envision the number of users that would come to use this code in some way shape or form. From HTTP based standards such as WHIP, and WHEP and the myriad of projects that use WebRTC in some way for ingest or egress. WebRTC is still one of the best ways to transport live video into a web browser for display. WebRTC's loose compatibility with the SIP ecosystem is also a driving force behind WebRTC's continued use.

Now, webrtcbin has definitely proved itself in situations that require a small number of sessions. Using webrtcbin for a mixing server (MCU) or even SFU with hundreds or even thousands of streams in a single application is still a tall ask. The biggest reason for this is the number of threads that are created for every WebRTC session.

Threads

RTCP thread - rtpbin (used by webrtcbin) creates a thread per session essentially for handling timeouts required by RTCP.
rtpjitterbuffer creates a thread per incoming stream in order to be able to handle timeouts and deal with late or missing RTP packets.
dtlsenc - A thread whose sole purpose is for being able to handle DTLS timeouts.
webrtcbin and signalling - A dedicated thread for handling signalling related changes such as SDP generation, applying remote SDPs, handling ICE candidates, etc.
webrtcbin and ICE - ICE uses libnice on a dedicated ICE network thread per WebRTC session.
Media streaming threads - One streaming thread for sending and receiving media data.

When an application requires many WebRTC sessions, the memory requirements and context switching overhead of having 5 extra threads per WebRTC session can limit the number of sessions that can be concurrently executed.

Pipeline loops

Another concern I had is that for the server mixing/forwarding use case, pipeline loops were almost a necessity due to the basic requirement that participants in a WebRTC call wanting to be able to see and listen to each other. The obvious answer to this problem is to split the pipeline and use some wormhole elements such as appsrc/appsink, intersink/intersrc, proxysrc/proxysink, etc.

What if? - `webrtcbin2`

With the benefit of hindsight, we can definitely improve on this situation and reduce the number of threads that is required by each additional WebRTC session. Let us go through the list from above.

Pipeline loops

In order to solve the problem of loops in the pipeline, I took a leaf out of the design we made for rtpbin2 and created separate webrtcsend and webrtcrecv elements that interact with a shared WebRTC session object by having the same id. This allows data to flow essentially in one direction without requiring any kind of loop in the pipeline graph.

For some background on why rtpbin2 was created, you can have a look at a previous post I wrote.

Threads

RTCP thread - Amortised over multiple instances inside rtpbin2 using a tokio scheduler.
Jitter buffer per stream - rtprecv (part of rtpbin2) uses the same tokio scheduler for RTCP handling as it does for handling timeouts and packets through the jitterbuffer introducing no extra threads.
dtlsenc is no longer - DTLS is performed (using OpenSSL) directly just before/after ICE processing.
webrtcsend/webrtcrecv and signalling - Signalling occurs on a tokio runtime shared across all instances of webrtcsend/webrtcrecv.
webrtcsend/webrtcrecv and ICE - Uses librice on the same tokio runtime as webrtcsend/webrtcrecv.
Media streaming thread - Same as webrtcbin. Can be amortised by using the threadshare elements.

If we count the number of threads saved, we can see that for every WebRTC session, at least 5 threads are no longer needed in the new design. At 100 sessions, that is roughly a 500 thread saving in both memory and contention.

Features of `webrtcsend`/`webrtcrecv`

While webrtcsend and webrtcrecv are functional and can successfully communicate with a web browser such as Chrome or Firefox, there are still some missing pieces. Some of the supported features include:

Audio and/or Video streaming. Data channels are not currently supported.
BUNDLE is supported and required for multiple media.
rtcp-mux is required.

A non exhaustive list of not yet supported features include:

Retransmissions and Forward Error Correction (rtpbin2 does not support this yet).
Data channels
Renegotiation
Statistics
TURN servers (librice supports it but not yet implemented in webrtcbin2)

All of these missing features are solveable with further implementation effort.

Example

A send and receive example using webrtcbin2 is available the upstream repository and can be used with this example web page. Just make sure that data channels are not enabled as they are currently not supported.

Closing

This work will be part of the upcoming GStreamer 1.29.2 development snapshot or can be built from the main branch of gst-plugins-rs.

Writing a mature WebRTC implementation is an endeavour that requires a fair bit of implementation effort to complete. If you would like to help make a secure, mature WebRTC implementation for GStreamer please get in touch.

#gstreamer #centricular #webrtc #rtp

New high-performance GStreamer UDP source element and SMPTE ST2110 capture

by Sebastian Dröge on May 4, 2026

New `udpsrc2` element

Over the past few years, I have worked on a new GStreamer UDP source element. This is finally merged now and will be part of both the GStreamer 1.30.0 release and the gst-plugins-rs 0.16.0 release.

The old element uses GIO for networking, which is quite inefficient by design. The new implementation uses about 50% less CPU on my machine compared to the old element for a 3 Gbit/s stream.

As can be seen from the docs of the new element, it preserves the API of the old element. As such it should generally be possible to use it as a drop-in replacement.

In addition to performance improvements, the new element also includes various other improvements:

Support for faster packet receiving via Generic Receive Offload (GRO) on Linux, and for using recvmmsg() on platforms where it is available to significantly improve receive performance.
Complete support for multicast source filtering, including negative filters, and support for platforms that do not have APIs for the IGMPv3 SSM mechanism.
Always obtaining kernel-side packet receive times if available, which was opt-in in the old element due to GIO performance issues with socket control messages.
New preserve-packetization property that allows outputting multiple packets in the same buffer, which improves performance for formats like MPEG-TS where the UDP packetization is not necessary.

Give it a try with your pipelines and workloads and share your feedback or any issues you encounter.

In the future, io_uring support on Linux could be added for even better receive performance.

SMPTE ST2110 capture

While udpsrc2 is an improvement in general, its primary motivation is better SMPTE ST2110 support in GStreamer. The old element could not handle the packet rates typically used for such streams very well.

ST2110 defines a UDP/RTP-based set of standards for transmitting raw or very-high bitrate audio / video / ancillary data over Ethernet. It is intended as a replacement for SDI.

Related to this, we recently also merged some other improvements:

A new, improved raw video RTP depayloader that supports ST2110-20.
A new, improved raw PCM audio RTP depayloader that supports ST2110-30.
An ST291 ancillary data RTP depayloader that supports ST2110-40.
Various improvements to the rtprecv element, especially for performance and handling of high-packet rate streams.

For all the new depayloaders there are also new, improved implementations of the corresponding payloaders available.

Together, these improvements enable reliable ST2110 stream capture in GStreamer.

An example pipeline putting it all together would look as follows:

$ gst-launch-1.0 \
    \ # Video capture pipeline part
    udpsrc2 address=239.255.64.20 port=16388 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=96, clock-rate=90000, encoding-name=RAW, sampling=YCbCr-4:2:2, depth=10, width=1920, height=1080, exactframerate=60, colorimetry=BT709, pm=2110GPM, ssn=ST2110-20:2017, tp=2110TPN, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-327995485-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpvrawdepay2 ! \
    \
    \ # Ancillary data capture pipeline part
    udpsrc2 address=239.255.64.20 port=16386 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=98, clock-rate=90000, encoding-name=SMPTE291, vpid_code=138, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-2672978631-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpsmpte291depay ! combiner.st2038 \
    \
    \ # Combination of video and ancillary data streams and output
    st2038combiner name=combiner start-time-selection=first ! videoconvert ! queue max-size-bytes=0 max-size-time=0 max-size-buffers=3 ! autovideosink \
    \
    \ # Audio capture and output pipeline part
    udpsrc2 address=239.255.64.20 port=16384 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=audio, payload=(int)97, clock-rate=48000, encoding-name=(string)L24, encoding-params=64, a-sendonly="", a-ptime=0.125, a-ts-refclk="ptp\=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct\=0", ssrc-603238248-cname=(string)E055FF0F3D6E4B349F7B786D8B6C837B' \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpL24depay2 ! audioconvert ! autoaudiosink

This pipeline receives a 1080p60 4:2:2 YUV 10-bit video stream, ST291 ancillary data, and a 24-bit 48kHz 64-channel PCM audio stream. The video and ancillary data are combined to a single stream, and then both the combined video-ancillary stream and the audio are output.

rtprecv is used here for translating packet capture timestamps and RTP header timestamps to consistent GStreamer timestamps.

Ancillary data

The pipeline above captures all three streams and merges the ancillary data stream with the video. The ancillary data itself is not processed further.

One way to process the ancillary data further is to extract ST12 timecodes from it and overlay them over the video.

For this, insert the following elements before the video sink:

 ... ! timecodestamper source=ancillary-meta ancillary-meta-locations='8:2000,570:2000' \
     ! videoconvert ! timeoverlay time-mode=time-code \
     ! autovideosink

Here timecodes from ancillary data at positions (8,2000) and (570,2000) would be extracted and converted to GstVideoTimeCodeMeta on the video buffers.

We recently added support for extracting ST12 timecodes from ancillary meta as well.

The positions depend on the video signal standard in use and can be found in the ST12 specifications.

#gstreamer #centricular #udp #st2110 #rtp #sdi #smpte291

Improved Support for iOS and new tvOS support in GStreamer 1.28

by Nirbheek Chauhan on Apr 16, 2026

We've been hard at work doing numerous small and large improvements to GStreamer for people who want to target Apple platforms: macOS, iOS, and tvOS.

iOS ARM64 Simulator Support via an XCFramework

With the GStreamer 1.28.0 release, the project now releases an XCFramework for iOS. As expected, this XCFramework supports iOS arm64, iOS Simulator x86_64, and iOS Simulator arm64. The legacy iOS framework that lipo-ed iOS arm64 and iOS Simulator x86_64 is now deprecated, and will be removed in a future release.

You can download the XCFramework from the official download page.

Thanks to Amy for helping me with this!

tvOS Support

As of version 1.28.1, GStreamer officially supports tvOS, and binaries for it are shipped as part of the iOS XCFramework. This means that the GStreamer 1.28.1 iOS XCFramework contains: ios-arm64, ios-arm64_x86_64-simulator, tvos-arm64, tvos-arm64_x86_64-simulator.

Most of the relevant Apple-specific plugins are supported:

osxaudio: Audio source/sink, using CoreAudio
atdec: Audio decoder, using AudioToolbox
atenc: Audio encoder, using AudioToolbox
vtdec: Video decoder, using VideoToolbox
vtenc: Video encoder, using VideoToolbox
glimagesink: Deprecated EAGL video sink
vulkansink: Metal-based video sink, using MoltenVK
vulkancolorconvert: Metal-accelerated video conversion, using MoltenVK
vulkanoverlaycompositor: Metal-accelerated video overlay compositor, using MoltenVK
... more Metal/Vulkan elements

Two elements that use AVCaptureDevice had to be disabled because they need more work to support tvOS:

avfvideosrc: Video capture source, using AVFoundation
avfdeviceprovider: Video capture device provider, using AVFoundation

Thanks to Remote Studio for sponsoring this work!

Improved support for using Rust plugins on Apple platforms

Linking more than one Rust plugin into your app had been broken on macOS and iOS for some time. The fix for that requires prelinking, which Amy has written about previously, but it couldn't be enabled on macOS due to some LLVM/LLD issues. We had to wait for the fixes to percolate down to a Rust toolchain release. That finally happened in Rust 1.93, but by that time a new problem had cropped up: Xcode 26.

Due to some toolchain changes in Xcode 26, linking Rust plugins was failing on macOS and also on iOS with the legacy framework. After weighing all the options, the best solution was to add -no_compact_unwind to the linker flags on macOS, and direct people to use the XCFramework when using Rust plugins on iOS.

This is now added automatically if you use pkg-config (using CMake or Meson, for example), but if you're using a plain Xcode project, you need to add -no_compact_unwind manually to linker flags in Xcode.

This fix will be available in the upcoming 1.28.3 release.

Many more macOS, iOS, tvOS improvements

Contributors have been hard at work with small and large improvements to the Apple-specific elements in GStreamer. Ranging from AV1 and VP9 decoding support in vtdec to better debug info, bugfixes, memory leak fixes, crash fixes, and much more. The patches are too many to list or even link!

#gstreamer #apple #ios #tvos #rust #devenv #xcframework

Python Wheels for GStreamer

by Nirbheek Chauhan on Feb 19, 2026

GStreamer has shipped binaries for all the major platforms for many years now: Windows, Android, macOS, iOS. Linux packages are, of course, handled by all the various distros.

However, if you wanted to use the Python bindings on macOS or Windows, you had to jump through hoops. Till now. GStreamer 1.28.0 ships Python wheels supporting Python 3.9, 3.10, 3.11, 3.12, 3.13, 3.14 on macOS (GIL) and Windows (GIL and free-threading). All you need to do is to run:

python3 -m pip install gstreamer-bundle==1.28.0

And that's it! You will have a complete GStreamer install, with all the plugins you expect on macOS and Windows, and all utilities including gst-launch-1.0 gst-inspect-1.0 gst-device-monitor-1.0 ges-launch-1.0 and so on.

The gstreamer-bundle package is a complete distribution, so it will pull in all the plugins, libraries, cmd-line tools, etc. If you want to depend on a more minimal GStreamer installation or you want to avoid pulling in GPL or known-patent-encumbered ("restricted") plugins, you can use the gstreamer-meta package. That puts plugins behind "extras" like gpl cli restricted gtk4 etc.

Many thanks to Pollen Robotics for sponsoring this work. The Reachy Mini companion robot by Pollen Robotics/Hugging Face uses GStreamer via the Python bindings and is the first production user of these wheels!

We're very excited to see more people make use of these wheels.

Read on for technical details on how all this was accomplished.

Step 1: Ship Python bindings via introspection on macOS and Windows

After many years, Python bindings support was re-introduced in GStreamer 1.26 and was shipped with the installers on macOS and Windows. This required significant work:

Re-introduce gobject-introspection support
Re-introduce Python bindings support and ship it on macOS + Windows
Load typelibs relocatably

Thanks to Amy for doing the bulk of the work here, and to everyone else who contributed towards this over the years: Andoni, Nacho, Thibault, Tjitte, and more that I'm sure I've missed.

Step 2: Build wheels for all supported Python versions

When shipping Python bindings for C libraries, it is necessary to also ship the accompanying libraries and plugins, lest ABI mismatches and incompatibilities arise. That's why the wheels we ship constitute a complete GStreamer distribution, including all plugin dependencies such as GTK4. This means you also have Python bindings for GTK4 available on macOS and Windows.

This wasn't easy to accomplish, especially because PyGObject doesn't use the limited Python C API. That means we can't just build for Python 3.9 and call it a day. We need separate wheels for each Python version × target.

The count goes something like this:

We split the gstreamer libraries, plugins, and dependencies across 11 wheels
We support 16 Python versions: 3.9 3.10 3.11 3.12 3.13 3.13t 3.14 3.14t
And 3 platforms: macOS universal, Windows MSVC x86_64, Windows MSVC x86

That's 11 × 16 × 3 = 528 wheels. That is absolutely untenable!

So we have to do some chicanery to trim that down:

Put everything that links to or loads Python in one wheel called gstreamer_python, so that everything else is agnostic to the Python version being used
Override py_limited_api to be cp39 for all agnostic wheels and mark them as not containing ext modules
Rebuild the recipes responsible for generating libraries or plugins that go into gstreamer_python with each Python version we need to support
On macOS, override plat_name to be macosx_10_13_universal2 for all agnostic wheels even if the Python version we're using doesn't support macOS 10.13, so that they can be reused across all Python versions

That brings us down to 92 wheels. Still quite a lot, but now it's a manageable number!

The long-term solution is to port PyGObject over to the Limited Python C API—which is quite a big undertaking—but should allow us to skip most of this for Python >=3.12.

Thanks to Amy once again for doing most of the work to make this possible, and to Pollen Robotics for sponsoring us to do it. Here are the relevant merge requests:

Step 3: Linux support

You may have noticed that there was no mention of wheels targeting Linux. That's a much harder problem to solve than shipping on macOS or Windows, so we had to punt it for a later release, likely one of the 1.28.x stable releases.

We're planning to target manylinux_2_28 and support Python 3.9+, but there are still unknowns that could throw a spanner in our plans. For instance:

GStreamer often utilizes subtle characteristics of the Linux graphics stack for good performance, which may break by targeting such an old base.
The difference in library versions shipped with the wheels vs on the system may cause subtle or catastrophic breakage in apps that also load system libraries.

We're hoping that we can overcome all this and ship something that allows users on any Linux distro to get a functional GStreamer just by doing pip install gstreamer-bundle.

In the meantime, please continue to use the distro-provided GStreamer packages and Python bindings, and if they're missing plugins or are too old, please contact your distro maintainer(s).

#gstreamer #centricular #python #macos #windows

GStreamer Whisper Speech-to-Text Element

by Mathieu Duponchelle on Feb 11, 2026

At the '25 GStreamer conference I gave a talk titled Costly Speech: an introduction.

This was in reference to the fact that all the speech-related elements used in the pipeline I presented were wrappers around for-pay cloud services or for-pay on-site servers.

At the end of the talk, I mentioned that plans for future development included new, "free" backends. The first piece of the puzzle was a Whisper-based transcriber.

I have the pleasure to announce that it is now implemented and published, thank you to Ray Tiley from Tightrope Media Systems for sponsoring this work!

Design / Implementation

The main design goal was for the new transcriber to behave identically to the existing transcribers, in particular:

It needed to output timestamped words one at a time
It needed to handle live streams with a configurable latency

In order to fulfill that second requirement, the implementation has to feed the model with chunks of a configurable duration.

This approach works well for constraining the latency, but didn't give the best results accuracy-wise, as words close to the chunk boundaries would often go misssing, poorly transcribed or duplicated.

To address this, the implementation uses two mechanisms:

It always feeds the previous chunk when running inference for a given chunk
It extracts tokens from a sliding window at a configurable distance from the "live edge"

Here's an example with a 4-second chunk duration and a 1 second live edge offset:

0     1     2     3     4     5     6     7     8
| 4-second chunk        | 4-second chunk        |
                  | 4-second token window |

This approach greatly mitigates the boundary issues, as the tokens are always extracted from a "stable" region of the model's output.

With the above settings, the element reports a 5-second latency, to which a configurable processing latency is added. That processing latency is dependent on the hardware, on my machine using CUDA and a NVIDIA RTX 5080 GPU processing time is around 10x real time, which means 1 second processing latency is sufficient.

The obvious drawback of this approach is a doubling of the resource usage as each chunk is fed twice through the inference model, it could be further refined to only feed part of the previous chunk and thus increase performance without sacrificing accuracy.

As the interface of the element follows that of other transcribers, it can be used as an alternative transcriber within transcriberbin.

Future prospects

The biggest missing piece to bring the transcriber to feature parity with other transcribers such as the speechmatics-based one is speaker diarization (~ identification).

Whisper itself does not support diarization. The tinydiarize project aimed to finetune models to address this, but it has unfortunately been put on hold for now, and only supported detecting speaker changes, not identifying individual speakers.

It is not clear at the moment what would be the best open source option to integrate for this task. Models such as NVidia's streaming sortformer are promising, but limited to four speakers for example.

We are very interested in suggestions on this front. Don't hesitate to hit us up if you have any or are interested in sponsoring further improvements to our growing stack of speech-related elements!

#gstreamer #centricular #rust #whisper #speech-to-text

New GStreamer icecastsink with AAC support

by Tim-Philipp Müller on Feb 9, 2026

Icecast is a Free and Open Source multimedia streaming server, primarily used for audio and radio streaming over HTTP(S).

In GStreamer you can send an audio stream to such a server with the shout2send sink element based on libshout2.

This works perfectly fine, but has one limitation: it does not support the AAC audio codec, which for some use cases and target systems is the preferred audio codec. This is because libshout2 does not support it and will not support it, at least not officially upstream.

Some streaming servers such as the Rocket Streaming Audio Server (RSAS) do support this though, and as such it would be nice to be able to send streams to them in AAC format as well.

Enter icecastsink, which is a new sink element written in Rust to send audio to an Icecast server.

It supports sending AAC audio in addition to Ogg/Vorbis, Ogg/Opus, FLAC and MP3, and also has support for automatic re-connect in case the server kicks off the client, which might happen if the client doesn't send data for a while.

Give it a spin and let us know how it goes!

#gstreamer #centricular #icecast #audio #streaming

GStreamer 1.28 Natively Supports Windows ARM64

by Nirbheek Chauhan on Feb 3, 2026

One of the many items on my "nice-to-have" TODO list has been shipping a GStreamer installer that natively targets Windows ARM64. Cerbero has had support for cross-compiling to Windows ARM64 since GStreamer 1.16 in the form of targeting UWP. However, once that was laid to rest with GStreamer 1.22, we didn't start shipping Windows ARM64 installers instead because it was looking like Microsoft's ARM64 experiment had also failed.

Lately, however, there's been a significant resurgence of ARM64 laptops that run Windows, and they seem to actually have compelling features for some types of users. So I spent a day or two and reinstated support for Windows ARM64 built with MSVC in Cerbero.

My purpose was just to find the shortest path to getting that to a usable state, so a bunch of plugins are missing. In particular all Rust plugins had to be disabled due to an issue building the ring crate. I am optimistic that someone will come along and help fix these issues 😉

You can find the installer at the usual location: https://gstreamer.freedesktop.org/download/#windows

Note that these binaries are cross-compiled from x86_64, so the installer itself is x86, and the contents are missing gobject-introspection and Python bindings. We are also unable to generate Python wheels for Windows ARM64 because of this. If someone would like to help with any of this, please get in touch on the Windows channel in GStreamer's Matrix community.

#gstreamer #centricular #python #windows #arm64

Using GstAnalytics from Rust with burn and YOLOX object detection

by Sebastian Dröge on Dec 31, 2025

Currently most code using the GStreamer Analytics library library is written in C or Python. To check how well the API works from Rust, and to have an excuse to play with the Rust burn deep-learning framework, I've implemented an object detection inference element based on the YOLOX model and a corresponding tensor decoder that allows usage with other elements based on the GstAnalytics API. I started this work at the last GStreamer hackfest, but this has now finally been merged and will be part of the GStreamer 1.28.0 release.

burn is a deep-learning framework in Rust that is approximately on the same level of abstraction as PyTorch. It features lots of computation backends (CPU-based, Vulkan, CUDA, ROCm, Metal, libtorch, ...), has loaders (or better: code generation) for e.g. ONNX or PyTorch models, and compiles and optimizes the model for a specific backend. It also comes with a repository containing various example models and links to other community models.

The first element is burn-yoloxinference. It takes raw RGB video frames and passes them through burn; as of the time of this writing either through a CPU-based or a Vulkan-based computation backend. The output then is the very same video frames with the raw object detection results attached as a GstTensorMeta. This is essentially a 85x8400 float matrix, which contains 8400 rows of candidate object detection boxes (4 floats) together with confidence values for the classes (80 floats for the pre-trained models on the COCO classes) and one confidence value for the overall box. The element itself is mostly boilerplate, caps negotiation code and glue code between GStreamer and burn.

The second element is yoloxtensordec. This takes the output of the first element and decodes the GstTensorMeta into a GstAnalyticsRelationMeta, which describes the detected objects with their bounding boxes in an abstract way. As part of this it also implements a non-maximum suppression (NMS) filter using intersection over unions (IoU) of bounding boxes to reduce the 8400 candidate boxes to a much lower number of actual likely object detections. The GstAnalyticsRelationMeta can then be used e.g. by the generic objectdetectionoverlay to render rectangles on top of the video, or the ioutracker elements to track objects over a sequence of frames. Again, this element is mostly boilerplate and caps negotiation code, plus around 100 SLOC of algorithm. In comparison the C YOLOv9 tensor decoder element is about 3x as much code, mostly thanks to the overhead of C memory book-keeping, lack of useful data structures and lack of abstraction language tools.

The reason why the tensor decoder is a separate element is mostly to have one such element per model and to have it implemented independently of the actual implementation and runtime of the model. The same tensor decoder should, for example, also work fine on the output of the onnxinference element with a YOLOX model. From GStreamer 1.28 onwards it will also be possible to autoplug suitable tensor decoders via the tensordecodebin element.

That the tensor decoders are independent of the actual implementation of the model also has the advantage that it can be implemented in a different language, preferably in a safer and less verbose language than C.

For using both elements together and using objectdetectionoverlay to render rectangles around the object detections, the following pipeline can be used:

gst-launch-1.0 souphttpsrc location=https://raw.githubusercontent.com/tracel-ai/models/f4444a90955c1c6fda90597aac95039a393beb5a/squeezenet-burn/samples/cat.jpg \
    ! jpegdec ! videoconvertscale ! "video/x-raw,width=640,height=640" \
    ! burn-yoloxinference model-type=large backend-type=vulkan ! yoloxtensordec label-file=COCO_classes.txt \
    ! videoconvertscale ! objectdetectionoverlay \
    ! videoconvertscale ! imagefreeze ! autovideosink -v

The output should look similar to this .

I also did a lightning talk about this at the GStreamer conference this year.

#gstreamer #centricular #rust #burn #yolo #yolox #analytics #machine-learning #deep-learning

Devlog

TL;DR

The landscape

Threads

Pipeline loops

What if? - webrtcbin2

Pipeline loops

Threads

Features of webrtcsend/webrtcrecv

Example

Closing

New udpsrc2 element

SMPTE ST2110 capture

Ancillary data

iOS ARM64 Simulator Support via an XCFramework

tvOS Support

Improved support for using Rust plugins on Apple platforms

Many more macOS, iOS, tvOS improvements

Step 1: Ship Python bindings via introspection on macOS and Windows

Step 2: Build wheels for all supported Python versions

Step 3: Linux support

Design / Implementation

Future prospects

What if? - `webrtcbin2`

Features of `webrtcsend`/`webrtcrecv`

New `udpsrc2` element