Centricular

Expertise, Straight from the Source



« Back

Devlog

Posts tagged with #FLV

For one of our recent projects, we worked on adding multitrack audio capabilities to the GStreamer FLV plugin following the Enhanced RTMP (v2) specification. All changes are now merged upstream (see MR 9682).

Enhanced RTMP

As the name suggests, this is an enhancement to the RTMP (and FLV) specifications. The latest version was released earlier this year and is aimed at meeting the technical standards of current and future online media broadcasting requirements, which include:

  • Contemporary audio/video codecs (HEVC, AV1, Opus, FLAC, etc.)
  • Multitrack capabilities (for concurrent management and processing)
  • Connection stability and resilience
  • and more

FLV and RTMP in GStreamer

The existing FLV and RTMP2 plugins followed the previous versions of the RTMP/FLV specifications, so they could handle at most one video and one audio track at a time. This is where most of the work was needed, to add the ability to handle multiple tracks.

Multitrack Audio

We considered a couple of options for adding multitrack audio and enhanced FLV capabilities:

  • Write completely new element(s), preferably in Rust (or)
  • Extend the current FLV muxer and demuxer elements

Writing a fresh set of elements from scratch, perhaps even in Rust, would have potentially made it easier to accommodate newer versions of the specification. But the second option, extending the existing FLV muxer/demuxer elements turned out to be simpler.

Problems to Solve

So, at a high level, we had two problems to solve:

  1. Handle multiple tracks

    As mentioned above, the FLV and RTMP plugins were equipped to handle only one audio and one video track. So we needed to add support for handling multiple audio and video tracks.

  2. Maintain backwards compatibility

    There should be no breakage in any existing applications that stream using the legacy FLV format. So, the muxer needs a mechanism to decide whether a given audio input needs to be written into the FLV container in the enhanced format or the legacy format.

A two-step solution

We arrived at a two-step solution for the implementation of multiple track handling:

  1. Use the audio template pad only for the legacy format and define a new audio_%u template for the enhanced format. That makes it clear which stream needs to be written as a legacy FLV track or an enhanced FLV track. The index of the audio_%u pads is also used as the track ID when writing enhanced FLV.

  2. Derive a new element from the existing FLV muxer called eflvmux, which defines the new audio_%u pad templates. The old flvmux will continue to support only the legacy codec/format. That way, the existing applications that use flvmux for legacy FLV streaming will not face any conflicts while requesting the pads.

Minor Caveat

Note that applications that use eflvmux need to specify the correct pad template name (audio or audio_%u) when requesting sink pads to ensure that the input audio data is written to the correct FLV track (legacy or enhanced).

Some formats such as MP3 and AAC are supported in both legacy and enhanced tracks, so we can't just auto-detect the right thing to do.

Interoperability issues

An interesting thing we noticed while testing streaming of multitrack audio with Twitch.tv is that when we tried to stream multiple enhanced FLV tracks or a mix of single legacy track and one or more enhanced FLV tracks, none of the combinations worked.

On the other hand, OBS was able to stream multitrack audio just fine to the same endpoint. Dissecting the RTMP packets sent out by OBS revealed that Twitch can accept at most two tracks, one legacy and one enhanced, and the enhanced FLV track's ID needs to be a non-zero value. To our knowledge, this is not documented anywhere.

It was a simple matter of track ID semantics which could be easily missed without referring to the OBS Studio code. This is also the case with FFmpeg which we recently noticed.

So we have requested a clarification on the track ID semantics from the enhanced RTMP specification maintainers and got a confirmation that 0 remains a valid value for track ID. As mentioned in the specification, it can be used to represent the highest priority track or the default track.

However, when streaming to servers like Twitch you may need to take care to request only pads with index greater than 0 from eflvmux because it may not accept tracks with ID 0.

Sample Pipelines to test

Here are some sample pipelines I used for testing the muxer and demuxer during the implementation.

Scope for other features

The FLV muxer and demuxer have undergone significant structural changes in order to support multiple audio tracks. This should make it easy to update the existing multitrack video capability merge request as well as add support for advanced codecs listed in the specification, some of which (like H265 and AV1) are already in progress.

There is also a work-in-progress merge request to add the eRTMP related support to the rtmp2 plugin.

P.S.: You can also refer to my talk on this topic at the GStreamer Conference that took place in London last month. The recording will be soon published on Ubicast.