Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1hOcNXZc+9dpZ9W+bYbXyReOaQDP48PBu82rFS86n4+hb3NA@mail.gmail.com>
Date: Fri, 23 Oct 2015 09:35:39 +0200
From: Denys Vlasenko <vda.linux@...glemail.com>
To: Rob Landley <rob@...dley.net>, Rich Felker <dalias@...c.org>, musl <musl@...ts.openwall.com>
Subject: Results of Aboriginal/musl CFLAGS experiment

Hi Rob, Rich,

I decided to take a look at how well building busybox against musl
would fare compared to building it against a custom-configured
uclibc I was using for quite some time.

Instead of reinventing the wheel, I decided to use Rob's excellent
Aboriginal Linux build scripts. Here's what I did.

I took Aboriginal's tip.tar.bz2, which was aboriginal-0b3b780ea942.
I built "./build.sh x86_64" without any tweaking.

Then I started adding gcc options I was using in my old custom uclibc build
to sources/sections/musl.build, and not changing anything else:

--- a.0/sources/sections/musl.build     2015-10-11 10:10:26.000000000 +0200
+++ a.1/sources/sections/musl.build     2015-10-23 02:37:45.803972995 +0200
@@ -1,7 +1,10 @@
 # Build and install musl

+(
+export CFLAGS="-Wl,--sort-section,alignment -Wl,--sort-common"
+
 CC= CROSS_COMPILE=${ARCH}- ./configure --prefix=/ &&
 DESTDIR="$STAGE_DIR" make -j $CPUS CROSS_COMPILE=${ARCH}- all install &&
 echo '#define __MUSL__' >> "$STAGE_DIR"/include/features.h &&
 ln -s libc.so "$STAGE_DIR/lib/ld-musl.so.0"
-
+)

I made four steps:
step 1 - CFLAGS+="-Wl,--sort-section,alignment -Wl,--sort-common"
step 2 - CFLAGS+="-ffunction-sections -fdata-sections"
step 3 - CFLAGS+="-falign-jumps=1 -falign-labels=1"
step 4 - CFLAGS+="-falign-functions=1 -falign-loops=1"

and collected size information from several executables after each step:
ls -l */build/native-compiler-x86_64/usr/lib/libc.a
size */build/native-compiler-x86_64/usr/lib/libc.so
size */build/root-filesystem-x86_64/usr/bin/toybox
size */build/root-filesystem-x86_64/usr/bin/busybox
size */build/native-compiler-x86_64/usr/bin/as
size */build/native-compiler-x86_64/usr/bin/ld
size */build/native-compiler-x86_64/usr/bin/bash
size */build/native-compiler-x86_64/usr/x86_64-unknown-linux/bin/collect2

Here is what I discovered.


Step 1, which added "-Wl,--sort-section,alignment -Wl,--sort-common"
affects only the size of libc.so:

   text    data     bss     dec filename
 572242    1920   11640  585802 a.0/native-compiler/lib/libc.so
 572068    1916   11576  585560 a.1/native-compiler/lib/libc.so

What it does is it reduces the chances when during linking,
when sections are merged, a small section (such as one
resulting from "static char flag_var") with no alignment restrictions
gets logded between two bigger ones (say, "static int global_cnt")
which want e.g. 32-bit alignment.

Without section sorting, byte-sized "flag_var" gets 3 bytes of padding.

With section sorting by alignment, one-byte flag variables have
higher chances of being grouped together and not requiring padding.
(It can be made even better. Linker is too dumb).


Step 2: adding "-ffunction-sections -fdata-sections"

Previous optimization isn't working too well because data objects
aren't living in separate sections, they are all grouped in one .data
and one .bss section per *.o file.

"-ffunction-sections -fdata-sections" fix this by putting every function
and data object into its own section. Then section sorting eliminates
many more padding gaps:

   text    data     bss     dec filename
 572068    1916   11576  585560 a.1/native-compiler/lib/libc.so
 570356    1900   11480  583736 a.2/native-compiler/lib/libc.so

More to it. Object files in static libc.a also have their functions
and objects each in its own section. This means that programs
linked with -Wl,--gc-sections (toybox and busybox do this)
will be able to drop unused code and data not on per-.o-file basis,
but on per-function and per-object basis, resulting in ~1% size decrease!

   text    data     bss     dec filename
 338047    6608   22384  367039 a.1/root-filesystem/usr/bin/toybox
 336143    6560   22352  365055 a.2/root-filesystem/usr/bin/toybox
   text    data     bss     dec filename
 324711     862    7648  333221 a.1/root-filesystem/bin/busybox
 321913     826    7520  330259 a.2/root-filesystem/bin/busybox

Most programs, alas, don't use -Wl,--gc-sections, but they still get
a tiny bit smaller:

   text    data     bss     dec filename
1029977    8752   60192 1098921 a.1/native-compiler/bin/as
1029945    8720   60192 1098857 a.2/native-compiler/bin/as
   text    data     bss     dec filename
1122513    9328   25120 1156961 a.1/native-compiler/bin/ld
1122513    9296   25120 1156929 a.2/native-compiler/bin/ld
   text    data     bss     dec filename
 425757   50652   16448  492857 a.1/native-compiler/bin/bash
 425725   50604   16416  492745 a.2/native-compiler/bin/bash
   text    data     bss     dec filename
 140624     880    9472  150976
a.1/native-compiler/x86_64-unknown-linux/bin/collect2
 140624     848    9440  150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2


I would say there is no reason to not do steps 1 and 2 always.
They don't pessimize execution speed. They simply get rid of some
data padding, and drop dead, unreachable code.


Step 3: add "-falign-jumps=1 -falign-labels=1"
Step 4: add "-falign-functions=1 -falign-loops=1"

Not particularly interesting - they do reduce size of every program I measured,
but some (many?) people would prefer to leave it to gcc to decide when
and how align code, for speed reasons. Anyway, here are stats:

 -rw-r--r-- 1 root root 2514966 a.2/native-compiler/lib/libc.a
 -rw-r--r-- 1 root root 2514726 a.3/native-compiler/lib/libc.a
 -rw-r--r-- 1 root root 2514646 a.4/native-compiler/lib/libc.a
   text    data     bss     dec filename
 570356    1900   11480  583736 a.2/native-compiler/lib/libc.so
 570148    1900   11480  583528 a.3/native-compiler/lib/libc.so
 569637    1900   11480  583017 a.4/native-compiler/lib/libc.so
   text    data     bss     dec filename
 336143    6560   22352  365055 a.2/root-filesystem/usr/bin/toybox
 335999    6560   22352  364911 a.3/root-filesystem/usr/bin/toybox
 335743    6560   22352  364655 a.4/root-filesystem/usr/bin/toybox
   text    data     bss     dec filename
 321913     826    7520  330259 a.2/root-filesystem/bin/busybox
 321801     826    7520  330147 a.3/root-filesystem/bin/busybox
 321541     826    7520  329887 a.4/root-filesystem/bin/busybox
  text    data     bss     dec filename
1029945    8720   60192 1098857 a.2/native-compiler/bin/as
1029817    8720   60192 1098729 a.3/native-compiler/bin/as
1029609    8720   60192 1098521 a.4/native-compiler/bin/as
   text    data     bss     dec filename
1122513    9296   25120 1156929 a.2/native-compiler/bin/ld
1122369    9296   25120 1156785 a.3/native-compiler/bin/ld
1122161    9296   25120 1156577 a.4/native-compiler/bin/ld
   text    data     bss     dec filename
 425725   50604   16416  492745 a.2/native-compiler/bin/bash
 425629   50604   16416  492649 a.3/native-compiler/bin/bash
 425437   50604   16416  492457 a.4/native-compiler/bin/bash
   text    data     bss     dec filename
 140624     848    9440  150912
a.2/native-compiler/x86_64-unknown-linux/bin/collect2
 140560     848    9440  150848
a.3/native-compiler/x86_64-unknown-linux/bin/collect2
 140336     848    9440  150624
a.4/native-compiler/x86_64-unknown-linux/bin/collect2

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.