For a project recently it was necessary to collect video frames of multiple streams during a specific interval, and in the future also audio, to pass it through an inference framework for extracting additional metadata from the media and attaching it to the frames.
While GStreamer has gained quite a bit of infrastructure in the past years for machine learning use-cases in the analytics library, there was nothing for this specific use-case yet.
As part of solving this, I proposed as design for a generic interface that allows combining and batching multiple streams into a single one by using empty buffers with a GstMeta
that contains the buffers of the original streams, and caps that include the caps of the original streams and allow format negotiation in the pipeline to work as usual.
While this covers my specific use case of combining multiple streams, it should be generic enough to also handle other cases that came up during the discussions.
In addition I wrote two new elements, analyticscombiner
and
analyticssplitter
, that make use of this new API for combining and batching
multiple streams in a generic, media-agnostic way over specific time
intervals, and later splitting it out again into the original streams. The combiner
can be configured to collect all media in the time interval, or only the first or last.
Conceptually the combiner element is similar to NVIDIA's DeepStream nvstreammux
element, and
in the future it should be possible to write a translation layer between the
GStreamer analytics library and DeepStream.
The basic idea for the usage of these elements is to have a pipeline like
-- stream 1 --\ / -- stream 1 with metadata --
-- analyticscombiner -- inference elements -- analyticssplitter --
-- stream 2 --/ \ -- stream 2 with metadata --
........ ......................
-- stream N -/ \- stream N with metadata --
The inference elements would only add additional metadata to each of the buffers, which can then be made use of further downstream in the pipeline for operations like overlays or blurring specific areas of the frames.
In the future there are likely going to be more batching elements for specific stream types, operating on multiple or a single stream, or making use of completely different batching strategies.
Special thanks also to Olivier and Daniel who provided very useful feedback during the review of the two merge requests.