Centricular

Expertise, Straight from the Source



« Back

Devlog

Posts tagged with #linux

At the GStreamer project, we produce SDKs for lots of platforms: Linux, Android, macOS, iOS, and Windows. However, as we port more and more plugins to Rust 🦀, we are finding ourselves backed into a corner.

Rust static libraries are simply too big.

To give you an example, the AWS folks changed their SDK back in March to switch their cryptographic toolkit over to their aws-lc-rs crate [1]. However, that causes a 2-10x increase in code size (bug reports here and here), which gets duplicated on every plugin that makes use of their ecosystem!

What are Rust staticlibs made of?

To summarise, each Rust plugin packs a copy of its dependencies, plus a copy of the Rust standard library. This is not a problem on shared libraries and executables by their very nature, but on static libraries it causes several issues:

First approach: Single-Object Prelinking

I won't bore you with the details as I've written another blog post on the subject; the gist is that you can unpack the library, and then ask the linker to perform "partial linking" or "relocatable linking" (Linux term) or "Single-Object Prelinking" (the Apple term, which I'll use throughout the post) over the object files. Setting which symbols you want to be visible for downstream consumers lets dead-code elimination take place at the plugin level, ensuring your libraries are now back to a reasonable size.

Why is it not enough?

Single-Object Prelinking has two drawbacks:

  • Unoptimized code: the linker won't be able to deduplicate functions between melded objects, as they've been hidden by the prelinking process.
  • Windows: there are no officially supported tools (read: Visual Studio, LLVM, GCC) to perform this at the compiler level. It is possible to do this with binutils, but the PE-COFF format doesn't allow to change the visibility of unexported functions.

Melt all the object files with the power of dragons' fire!

As said earlier, no tools on Windows support prelinking officially yet, but there's another thing we can do: library deduplication.

Thanks to Rust's comprehensive crate ecosystem, I wrote a new CLI tool which I called dragonfire. Given a complete Rust workspace or list of static libraries, dragonfire:

  1. reads all the static libraries in one pass
  2. deduplicates the object files inside them based on their size and naming (Rust has its own, unique naming convention for object files -- pretty useful!)
  3. copies the duplicate objects into a new static library (usually called gstrsworkspace as its primary use is for the GStreamer ecosystem)
  4. removes the duplicates from the rest of the libraries
  5. updates the symbol table in each of the libraries with the bundled LLVM tools

Thanks to the ar crate, the unpacking and writing only happens at stage 3, ensuring no wasteful I/O slowdowns takes place. The llvm-tools-preview component in turn takes care of locating and calling up llvm-ar for updating the workspace's symbol tables.

A special mention is deserved to the object files' naming convention. Assume a Rust staticlib named libfoo, its object files will be named as:

  • crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o
  • On Windows only: foo.crate_name-hash1.crate_name.hash2-cgu.nnn.rcgu.o
  • On non-Windows platforms: same as above, but replacing foo with libfoo-hash

In all cases, crate_name means a dependency present somewhere in the workspace tree, and nnn is a number that will be bigger than zero whenever -C codegen-units was set to higher than 1.

For dragonfire purposes, dropping the library prefix is enough to be able to deduplicate object files; however, on Windows we can also find import library stubs, which LLVM can generate on its own by the use of the #[raw-dylib] annotation [2]. Import stubs can have any extension, e.g. .dll, .exe and .sys (the latter two coming from private Win32 APIs). These stubs cannot be deduplicated as they are generated individually per imported function, so dragonfire must preserve them where they are.

Drawbacks of object file deduplication

Again there are several disadvantages of this approach. On Apple platforms, deduplicating libraries triggers a strange linker error, which I've not seen before:

ld: multiple errors: compact unwind must have at least 1 fixup in '<framework>/GStreamer[arm64][1021](libgstrsworkspace_a-3f2b47962471807d-lse_ldset4_acq.o)'; r_symbolnum=-19 out of range in '<framework>/GStreamer[arm64][1022](libgstrsworkspace_a-compiler_builtins-350c23344d78cfbc.compiler_builtins.5e126dca1f5284a9-cgu.162.rcgu.o)'

This also led me to find that Rust libraries were packing bitcode, which is forbidden by Apple. (This was thankfully already fixed before shipping time, but we've not yet updated our Rust minimum version to take advantage of it.)

Another drawback is that Rust's implementation of LTO causes dead-code elimination at the crate level, as opposed to the workspace level. This makes object file deduplication impossible, as each copy is different.

For the Windows platform, there is an extra drawback which affects specifically object files produced by LLVM: the COMDAT sections are set to IMAGE_COMDAT_SELECT_NODUPLICATES. This means that the linker will outright reject functions with multiple definitions, rather than realise they're all duplicates and discarding all but one of the copies. MSVC in particular performs symbol resolution before dead-code elimination. This means that linking will fail because of unresolved symbols before dead code elimination kicks in; to use deduplicated libraries, one must set the linker flags /OPT:REF /FORCE:UNRESOLVED to ensure the dead code can be successfully eliminated.

Results

With library deduplication, we can make static libraries up to 44x smaller when building under MSVC [3] (you can expand the tables below for the full comparison):

  • gstaws.lib: from 173M to 71M (~2.5x)
  • gstrswebrtc.lib: from 193M to 66M (~2.9x)
  • gstwebrtchttp.lib: from 66M to 1,5M (~ 44x)
Table: before and after melding under MSVC
file no prelinking melded
gstaws.lib 173M 71M
gstcdg.lib 36M 572K
gstclaxon.lib 32M 568K
gstdav1d.lib 34M 936K
gstelevenlabs.lib 59M 1008K
gstfallbackswitch.lib 37M 2,3M
gstffv1.lib 34M 744K
gstfmp4.lib 39M 3,2M
gstgif.lib 34M 1,1M
gstgopbuffer.lib 30M 456K
gsthlsmultivariantsink.lib 46M 1,6M
gsthlssink3.lib 41M 1,2M
gsthsv.lib 34M 796K
gstjson.lib 31M 704K
gstlewton.lib 33M 1,2M
gstlivesync.lib 33M 728K
gstmp4.lib 38M 2,2M
gstmpegtslive.lib 31M 704K
gstndi.lib 38M 2,8M
gstoriginalbuffer.lib 34M 376K
gstquinn.lib 75M 23M
gstraptorq.lib 33M 2,4M
gstrav1e.lib 46M 11M
gstregex.lib 38M 404K
gstreqwest.lib 58M 1,4M
gstrsanalytics.lib 35M 1000K
gstrsaudiofx.lib 54M 22M
gstrsclosedcaption.lib 52M 8,4M
gstrsinter.lib 35M 604K
gstrsonvif.lib 46M 2,0M
gstrspng.lib 35M 1,2M
gstrsrtp.lib 59M 11M
gstrsrtsp.lib 57M 4,4M
gstrstracers.lib 40M 2,4M
gstrsvideofx.lib 48M 11M
gstrswebrtc.lib 193M 66M
gstrsworkspace.lib N/A 137M
gststreamgrouper.lib 30M 376K
gsttextahead.lib 30M 332K
gsttextwrap.lib 32M 2,1M
gstthreadshare.lib 52M 12M
gsttogglerecord.lib 35M 808K
gsturiplaylistbin.lib 31M 648K
gstvvdec.lib 34M 564K
gstwebrtchttp.lib 66M 1,5M

The results from the melding above can be compared with the file sizes obtained using LTO on Windows [4] (remember it doesn't actually fix linking against plugins):

  • gstaws.lib: from 71M (LTO) to 67M (melded) (-5.6%)
  • gstrswebrtc.lib: from 105M to 66M (-37.1%)
  • gstwebrtchttp.lib: from 28M to 1,5M (-94.6%)
Table: before and after LTO under MSVC (no melding involved)
file (codegen-units=1 in all cases) no prelinking lto=thin opt-level=s + lto=thin debug=1 + opt-level=s debug=1 + lto=thin + opt-level=s
old/gstaws.lib 199M 199M 171M 78M 67M
old/gstcdg.lib 11M 11M 11M 7,5M 7,5M
old/gstclaxon.lib 11M 11M 11M 7,7M 7,7M
old/gstdav1d.lib 12M 12M 12M 7,9M 7,8M
old/gstelevenlabs.lib 52M 52M 49M 24M 22M
old/gstfallbackswitch.lib 18M 18M 17M 11M 11M
old/gstffv1.lib 11M 11M 11M 7,6M 7,6M
old/gstfmp4.lib 20M 20M 19M 12M 11M
old/gstgif.lib 12M 12M 12M 7,9M 7,9M
old/gstgopbuffer.lib 9,7M 9,7M 9,7M 7,5M 7,4M
old/gsthlsmultivariantsink.lib 16M 16M 16M 9,6M 9,4M
old/gsthlssink3.lib 14M 14M 14M 8,9M 8,8M
old/gsthsv.lib 11M 11M 11M 7,8M 7,7M
old/gstjson.lib 12M 12M 12M 8,4M 8,2M
old/gstlewton.lib 12M 12M 12M 8,1M 8,1M
old/gstlivesync.lib 12M 12M 12M 8,3M 8,2M
old/gstmp4.lib 17M 17M 17M 9,9M 9,7M
old/gstmpegtslive.lib 12M 12M 12M 8,0M 7,9M
old/gstndi.lib 21M 21M 20M 12M 11M
old/gstoriginalbuffer.lib 9,6M 9,6M 9,7M 7,4M 7,3M
old/gstquinn.lib 94M 94M 86M 39M 35M
old/gstraptorq.lib 18M 18M 17M 9,8M 9,4M
old/gstrav1e.lib 39M 39M 37M 19M 18M
old/gstregex.lib 26M 26M 25M 14M 14M
old/gstreqwest.lib 53M 53M 49M 24M 22M
old/gstrsanalytics.lib 15M 15M 14M 9,2M 8,9M
old/gstrsaudiofx.lib 57M 57M 56M 23M 22M
old/gstrsclosedcaption.lib 40M 40M 36M 20M 18M
old/gstrsinter.lib 14M 14M 13M 8,5M 8,4M
old/gstrsonvif.lib 21M 21M 20M 11M 11M
old/gstrspng.lib 13M 13M 13M 8,2M 8,2M
old/gstrsrtp.lib 47M 47M 44M 22M 20M
old/gstrsrtsp.lib 35M 35M 33M 16M 15M
old/gstrstracers.lib 28M 28M 27M 16M 15M
old/gstrsvideofx.lib 16M 16M 35M 9,2M 15M
old/gstrswebrtc.lib 329M 329M 284M 124M 105M
old/gststreamgrouper.lib 9,6M 9,6M 9,7M 7,2M 7,2M
old/gsttextahead.lib 9,6M 9,6M 9,5M 7,4M 7,3M
old/gsttextwrap.lib 13M 13M 13M 8,4M 8,4M
old/gstthreadshare.lib 49M 49M 45M 23M 20M
old/gsttogglerecord.lib 13M 13M 13M 8,5M 8,4M
old/gsturiplaylistbin.lib 11M 11M 11M 7,9M 7,9M
old/gstvvdec.lib 11M 11M 11M 7,5M 7,5M
old/gstwebrtchttp.lib 69M 69M 63M 30M 28M

Conclusion

This article presents several longstanding pain points in Rust, namely staticlib binary sizes, symbol leaking, and incompatibilities between Rust and MSVC. I demonstrate the tool dragonfire that aims to address and work around, where possible, these issues, along with remaining issues to be addressed.

As explained earlier, dragonfire treated libraries are live on all platforms except Apple's, if you use the development packages from mainline; it's on track hopefully for the 1.28 release of GStreamer. There's already a merge request pending to enable it for Apple platforms, we're only waiting to update the Rust mininum version.

If you want to have a look, dragonfire's source code is available at Freedesktop's GitLab instance. Please note that at the moment I have no plans to submit this to crates.io.

Feel free to contact me with any feedback, and thanks for reading!


  1. See its default-https-client feature at lib.rs, you will find it throughout the AWS SDK ecosystem. ↩︎

  2. https://doc.rust-lang.org/reference/items/external-blocks.html#dylib-versus-raw-dylib ↩︎

  3. In all cases the -C flags are debug=1 + codegen-units=1 + opt-level=s; see this comment for the complete results across all platforms. ↩︎

  4. Source: https://gitlab.freedesktop.org/gstreamer/cerbero/-/merge_requests/1895 ↩︎