Centricular

Expertise, Straight from the Source



« Back

Devlog

Posts tagged with #audio

Icecast is a Free and Open Source multimedia streaming server, primarily used for audio and radio streaming over HTTP(S).

In GStreamer you can send an audio stream to such a server with the shout2send sink element based on libshout2.

This works perfectly fine, but has one limitation: it does not support the AAC audio codec, which for some use cases and target systems is the preferred audio codec. This is because libshout2 does not support it and will not support it, at least not officially upstream.

Some streaming servers such as the Rocket Streaming Audio Server (RSAS) do support this though, and as such it would be nice to be able to send streams to them in AAC format as well.

Enter icecastsink, which is a new sink element written in Rust to send audio to an Icecast server.

It supports sending AAC audio in addition to Ogg/Vorbis, Ogg/Opus, FLAC and MP3, and also has support for automatic re-connect in case the server kicks off the client, which might happen if the client doesn't send data for a while.

Give it a spin and let us know how it goes!



Audio source separation describes the process of splitting an already mixed audio stream into its individual, logical sources. For example, splitting a song into separate streams for its individual instruments and vocals. This can be used for example for karaoke, music practice, or isolating the speaker from background noise for easier understanding by humans or improving results of speech-to-text processing.

Starting with GStreamer 1.28.0 an element for this purpose will be included. It is based on the Python/pytorch implementation of demucs and comes with various pre-trained models with different performance and accuracy characteristics, as well as which different sets of sources they can separate. CPU-based processing is generally multiple times real-time on modern CPUs (around 8x on mine) but GPU-based processing via pytorch is also possible.

The element itself is part of the GStreamer Rust plugins and can either run demucs locally in-process using an embedded Python interpreter via pyo3, or via a small Python service over WebSockets that can run either locally or remotely (e.g. for thin clients). The used model, and chunk size and overlap between chunks can be configured. Chunk size and overlap provide control over the introduced latency (lower values give lower latency) and quality (higher values give better quality).

The separate sources are provided on individual source pads of the element and it effectively behaves like a demuxer. A pipeline for karaoke would for example look as follows:

gst-launch-1.0 uridecodebin uri=file:///path/to/music/file ! audioconvert ! tee name=t ! \
  queue max-size-time=0 max-size-bytes=0 max-size-buffers=2 ! demucs name=demucs model-name=htdemucs \
  demucs.src_vocals ! queue ! audioamplify amplification=-1 ! mixer.sink_0 \
  t. ! queue max-size-time=9000000000 max-size-bytes=0 max-size-buffers=0 ! mixer.sink_1 \
  audiomixer name=mixer ! audioconvert ! autoaudiosink

This takes an URI to a music file, passes that through the demucs element for extracting the vocals, then takes the original input via a tee and subtracts the vocals from it by first inverting all samples of the vocals stream with the audioamplify element and then mixing it with the original input with an audiomixer.

I also did a lightning talk about this at the GStreamer conference this year.



Over the past few years, we've been slowly working on improving the platform-specific plugins for Windows, macOS, iOS, and Android, and making them work as well as the equivalent plugins on Linux. In this episode, we will look at audio device switching in the source and sink elements on macOS and Windows.

On Linux, if you're using the PulseAudio elements (both with the PulseAudio daemon and PipeWire), you get perfect device switching: quick, seamless, easy, and reliable. Simply set the device property whenever you want and you're off to the races. If the device gets unplugged, the pipeline will continue, and you will get notified of the unplug via the GST_MESSAGE_DEVICE_REMOVED bus message from GstDeviceMonitor so you can change the device.

As of a few weeks ago, the Windows Audio plugin wasapi2 implements the same behaviour. All you have to do is set the device property to whatever device you want (fetched using the GstDeviceMonitor API), at any time.

A merge request is open for adding the same feature to the macOS audio plugin, and is expected to be merged soon.

For graceful error handling, such as accidental device unplug or other unexpected errors, there's a new continue-on-error property. Setting that will cause the source to output silence after unplug, whereas the sink will simply discard the buffers. An element warning will be emitted to notify the app (alongside the GST_MESSAGE_DEVICE_REMOVED bus message if there was a hardware unplug), and the app can switch the device by setting the device property.

Thanks to Seungha and Piotr for working on this!