Centricular

Expertise, Straight from the Source



« Back

Devlog

Posts tagged with #rtp

TL;DR

There is a new webrtcbin2 rust plugin containing split webrtcsend and webrtcrecv elements for handling a WebRTC session. The highlights of webrtcbin2 are that it uses less threads per session by using rtpsend and rtprecv (also rust), implementing DTLS handling internally, using librice (ICE in rust), and handling signalling all within an async runtime. All of these components also share threads with other instances of webrtcsend and webrtcrecv allowing for an even further reduction in the amount of resources significantly improving scalability.

The landscape

When I originally wrote the webrtcbin GStreamer element almost 10 years ago, I did not completely envision the number of users that would come to use this code in some way shape or form. From HTTP based standards such as WHIP, and WHEP and the myriad of projects that use WebRTC in some way for ingest or egress. WebRTC is still one of the best ways to transport live video into a web browser for display. WebRTC's loose compatibility with the SIP ecosystem is also a driving force behind WebRTC's continued use.

Now, webrtcbin has definitely proved itself in situations that require a small number of sessions. Using webrtcbin for a mixing server (MCU) or even SFU with hundreds or even thousands of streams in a single application is still a tall ask. The biggest reason for this is the number of threads that are created for every WebRTC session.

Threads

  1. RTCP thread - rtpbin (used by webrtcbin) creates a thread per session essentially for handling timeouts required by RTCP.
  2. rtpjitterbuffer creates a thread per incoming stream in order to be able to handle timeouts and deal with late or missing RTP packets.
  3. dtlsenc - A thread whose sole purpose is for being able to handle DTLS timeouts.
  4. webrtcbin and signalling - A dedicated thread for handling signalling related changes such as SDP generation, applying remote SDPs, handling ICE candidates, etc.
  5. webrtcbin and ICE - ICE uses libnice on a dedicated ICE network thread per WebRTC session.
  6. Media streaming threads - One streaming thread for sending and receiving media data.

When an application requires many WebRTC sessions, the memory requirements and context switching overhead of having 5 extra threads per WebRTC session can limit the number of sessions that can be concurrently executed.

Pipeline loops

Another concern I had is that for the server mixing/forwarding use case, pipeline loops were almost a necessity due to the basic requirement that participants in a WebRTC call wanting to be able to see and listen to each other. The obvious answer to this problem is to split the pipeline and use some wormhole elements such as appsrc/appsink, intersink/intersrc, proxysrc/proxysink, etc.

What if? - webrtcbin2

With the benefit of hindsight, we can definitely improve on this situation and reduce the number of threads that is required by each additional WebRTC session. Let us go through the list from above.

Pipeline loops

In order to solve the problem of loops in the pipeline, I took a leaf out of the design we made for rtpbin2 and created separate webrtcsend and webrtcrecv elements that interact with a shared WebRTC session object by having the same id. This allows data to flow essentially in one direction without requiring any kind of loop in the pipeline graph.

For some background on why rtpbin2 was created, you can have a look at a previous post I wrote.

Threads

  1. RTCP thread - Amortised over multiple instances inside rtpbin2 using a tokio scheduler.
  2. Jitter buffer per stream - rtprecv (part of rtpbin2) uses the same tokio scheduler for RTCP handling as it does for handling timeouts and packets through the jitterbuffer introducing no extra threads.
  3. dtlsenc is no longer - DTLS is performed (using OpenSSL) directly just before/after ICE processing.
  4. webrtcsend/webrtcrecv and signalling - Signalling occurs on a tokio runtime shared across all instances of webrtcsend/webrtcrecv.
  5. webrtcsend/webrtcrecv and ICE - Uses librice on the same tokio runtime as webrtcsend/webrtcrecv.
  6. Media streaming thread - Same as webrtcbin. Can be amortised by using the threadshare elements.

If we count the number of threads saved, we can see that for every WebRTC session, at least 5 threads are no longer needed in the new design. At 100 sessions, that is roughly a 500 thread saving in both memory and contention.

Features of webrtcsend/webrtcrecv

While webrtcsend and webrtcrecv are functional and can successfully communicate with a web browser such as Chrome or Firefox, there are still some missing pieces. Some of the supported features include:

  • Audio and/or Video streaming. Data channels are not currently supported.
  • BUNDLE is supported and required for multiple media.
  • rtcp-mux is required.

A non exhaustive list of not yet supported features include:

  • Retransmissions and Forward Error Correction (rtpbin2 does not support this yet).
  • Data channels
  • Renegotiation
  • Statistics
  • TURN servers (librice supports it but not yet implemented in webrtcbin2)

All of these missing features are solveable with further implementation effort.

Example

A send and receive example using webrtcbin2 is available the upstream repository and can be used with this example web page. Just make sure that data channels are not enabled as they are currently not supported.

Closing

This work will be part of the upcoming GStreamer 1.29.2 development snapshot or can be built from the main branch of gst-plugins-rs.

Writing a mature WebRTC implementation is an endeavour that requires a fair bit of implementation effort to complete. If you would like to help make a secure, mature WebRTC implementation for GStreamer please get in touch.



New udpsrc2 element

Over the past few years, I have worked on a new GStreamer UDP source element. This is finally merged now and will be part of both the GStreamer 1.30.0 release and the gst-plugins-rs 0.16.0 release.

The old element uses GIO for networking, which is quite inefficient by design. The new implementation uses about 50% less CPU on my machine compared to the old element for a 3 Gbit/s stream.

As can be seen from the docs of the new element, it preserves the API of the old element. As such it should generally be possible to use it as a drop-in replacement.

In addition to performance improvements, the new element also includes various other improvements:

  • Support for faster packet receiving via Generic Receive Offload (GRO) on Linux, and for using recvmmsg() on platforms where it is available to significantly improve receive performance.

  • Complete support for multicast source filtering, including negative filters, and support for platforms that do not have APIs for the IGMPv3 SSM mechanism.

  • Always obtaining kernel-side packet receive times if available, which was opt-in in the old element due to GIO performance issues with socket control messages.

  • New preserve-packetization property that allows outputting multiple packets in the same buffer, which improves performance for formats like MPEG-TS where the UDP packetization is not necessary.

Give it a try with your pipelines and workloads and share your feedback or any issues you encounter.

In the future, io_uring support on Linux could be added for even better receive performance.

SMPTE ST2110 capture

While udpsrc2 is an improvement in general, its primary motivation is better SMPTE ST2110 support in GStreamer. The old element could not handle the packet rates typically used for such streams very well.

ST2110 defines a UDP/RTP-based set of standards for transmitting raw or very-high bitrate audio / video / ancillary data over Ethernet. It is intended as a replacement for SDI.

Related to this, we recently also merged some other improvements:

For all the new depayloaders there are also new, improved implementations of the corresponding payloaders available.

Together, these improvements enable reliable ST2110 stream capture in GStreamer.

An example pipeline putting it all together would look as follows:

$ gst-launch-1.0 \
    \ # Video capture pipeline part
    udpsrc2 address=239.255.64.20 port=16388 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=96, clock-rate=90000, encoding-name=RAW, sampling=YCbCr-4:2:2, depth=10, width=1920, height=1080, exactframerate=60, colorimetry=BT709, pm=2110GPM, ssn=ST2110-20:2017, tp=2110TPN, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-327995485-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpvrawdepay2 ! \
    \
    \ # Ancillary data capture pipeline part
    udpsrc2 address=239.255.64.20 port=16386 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=video, payload=98, clock-rate=90000, encoding-name=SMPTE291, vpid_code=138, a-sendonly="", a-ts-refclk="ptp=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct=0", ssrc-2672978631-cname=E055FF0F3D6E4B349F7B786D8B6C837B' ! \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpsmpte291depay ! combiner.st2038 \
    \
    \ # Combination of video and ancillary data streams and output
    st2038combiner name=combiner start-time-selection=first ! videoconvert ! queue max-size-bytes=0 max-size-time=0 max-size-buffers=3 ! autovideosink \
    \
    \ # Audio capture and output pipeline part
    udpsrc2 address=239.255.64.20 port=16384 multicast-iface=enp15s0 buffer-size=20000000 caps='application/x-rtp, media=audio, payload=(int)97, clock-rate=48000, encoding-name=(string)L24, encoding-params=64, a-sendonly="", a-ptime=0.125, a-ts-refclk="ptp\=IEEE1588-2008:7C-2E-0D-FF-FE-1C-81-14:127", a-mediaclk="direct\=0", ssrc-603238248-cname=(string)E055FF0F3D6E4B349F7B786D8B6C837B' \
      rtprecv latency=0 ! queue max-size-bytes=0 max-size-buffers=0 max-size-time=500000000 ! rtpL24depay2 ! audioconvert ! autoaudiosink

This pipeline receives a 1080p60 4:2:2 YUV 10-bit video stream, ST291 ancillary data, and a 24-bit 48kHz 64-channel PCM audio stream. The video and ancillary data are combined to a single stream, and then both the combined video-ancillary stream and the audio are output.

rtprecv is used here for translating packet capture timestamps and RTP header timestamps to consistent GStreamer timestamps.

Ancillary data

The pipeline above captures all three streams and merges the ancillary data stream with the video. The ancillary data itself is not processed further.

One way to process the ancillary data further is to extract ST12 timecodes from it and overlay them over the video.

For this, insert the following elements before the video sink:

 ... ! timecodestamper source=ancillary-meta ancillary-meta-locations='8:2000,570:2000' \
     ! videoconvert ! timeoverlay time-mode=time-code \
     ! autovideosink

Here timecodes from ancillary data at positions (8,2000) and (570,2000) would be extracted and converted to GstVideoTimeCodeMeta on the video buffers.

We recently added support for extracting ST12 timecodes from ancillary meta as well.

The positions depend on the video signal standard in use and can be found in the ST12 specifications.



While working on other ancillary data related features in GStreamer (more on that some other day), I noticed that we didn't have support for sending or receiving ancillary data via RTP in GStreamer despite it being a quite simple RTP mapping defined in RFC 8331 and it being used as part of ST 2110.

The new RTP rtpsmpte291pay payloader and rtpsmpte291depay depayloader can be found in this MR for gst-plugins-rs, which should be merged in the next days.

The new elements pass the SMPTE ST 291-1 ancillary data as ST 2038 streams through the pipeline. ST 2038 streams can be directly extracted from or stored in MXF or MPEG-TS containers, can be extracted or inserted into SDI streams with the AJA or Blackmagic Decklink sources/sinks, or can be handled generically by the ST 2038 elements from the rsclosedcaption plugin.

For example the following pipeline can be used to convert an SRT subtitle file to CEA-708 closed captions, which are then converted to an ST 2038 stream and sent over RTP:

$ gst-launch-1.0 filesrc location=file.srt ! subparse ! \
    tttocea708 ! closedcaption/x-cea-708,framerate=30/1 ! ccconverter ! \
    cctost2038anc ! rtpsmpte291pay ! \
    udpsink host=123.123.123.123 port=45678

Now you might be wondering how ST 291-1 and ST 2038 are related to each other and what ST 2038 has to do with RTP.

ST 291-1 is the basic standard that defines the packet format for ancillary packets as e.g. transmitted over SDI. ST 2038 on the other hand defines a mechanism for packaging ST 291-1 into MPEG-TS, and in addition to the plain ST 291-1 packets provides some additional information like the line number on which the ST 291-1 packet is to be stored. RFC 8331 defines a similar mapping just for RTP, and apart from one field it provides exactly the same information and conversion between the two formats is relatively simple.

Using ST 2038 as generic ancillary data stream format in GStreamer seemed like the pragmatic choice here. GStreamer already had support for handling ST 2038 streams in various elements, a set of helper elements to handle ST 2038 streams, and e.g. GStreamer's MXF ANC support (ST 436) also uses ST 2038 as stream format.