|
Message-ID: <CAK1hOcNXZc+9dpZ9W+bYbXyReOaQDP48PBu82rFS86n4+hb3NA@mail.gmail.com> Date: Fri, 23 Oct 2015 09:35:39 +0200 From: Denys Vlasenko <vda.linux@...glemail.com> To: Rob Landley <rob@...dley.net>, Rich Felker <dalias@...c.org>, musl <musl@...ts.openwall.com> Subject: Results of Aboriginal/musl CFLAGS experiment Hi Rob, Rich, I decided to take a look at how well building busybox against musl would fare compared to building it against a custom-configured uclibc I was using for quite some time. Instead of reinventing the wheel, I decided to use Rob's excellent Aboriginal Linux build scripts. Here's what I did. I took Aboriginal's tip.tar.bz2, which was aboriginal-0b3b780ea942. I built "./build.sh x86_64" without any tweaking. Then I started adding gcc options I was using in my old custom uclibc build to sources/sections/musl.build, and not changing anything else: --- a.0/sources/sections/musl.build 2015-10-11 10:10:26.000000000 +0200 +++ a.1/sources/sections/musl.build 2015-10-23 02:37:45.803972995 +0200 @@ -1,7 +1,10 @@ # Build and install musl +( +export CFLAGS="-Wl,--sort-section,alignment -Wl,--sort-common" + CC= CROSS_COMPILE=${ARCH}- ./configure --prefix=/ && DESTDIR="$STAGE_DIR" make -j $CPUS CROSS_COMPILE=${ARCH}- all install && echo '#define __MUSL__' >> "$STAGE_DIR"/include/features.h && ln -s libc.so "$STAGE_DIR/lib/ld-musl.so.0" - +) I made four steps: step 1 - CFLAGS+="-Wl,--sort-section,alignment -Wl,--sort-common" step 2 - CFLAGS+="-ffunction-sections -fdata-sections" step 3 - CFLAGS+="-falign-jumps=1 -falign-labels=1" step 4 - CFLAGS+="-falign-functions=1 -falign-loops=1" and collected size information from several executables after each step: ls -l */build/native-compiler-x86_64/usr/lib/libc.a size */build/native-compiler-x86_64/usr/lib/libc.so size */build/root-filesystem-x86_64/usr/bin/toybox size */build/root-filesystem-x86_64/usr/bin/busybox size */build/native-compiler-x86_64/usr/bin/as size */build/native-compiler-x86_64/usr/bin/ld size */build/native-compiler-x86_64/usr/bin/bash size */build/native-compiler-x86_64/usr/x86_64-unknown-linux/bin/collect2 Here is what I discovered. Step 1, which added "-Wl,--sort-section,alignment -Wl,--sort-common" affects only the size of libc.so: text data bss dec filename 572242 1920 11640 585802 a.0/native-compiler/lib/libc.so 572068 1916 11576 585560 a.1/native-compiler/lib/libc.so What it does is it reduces the chances when during linking, when sections are merged, a small section (such as one resulting from "static char flag_var") with no alignment restrictions gets logded between two bigger ones (say, "static int global_cnt") which want e.g. 32-bit alignment. Without section sorting, byte-sized "flag_var" gets 3 bytes of padding. With section sorting by alignment, one-byte flag variables have higher chances of being grouped together and not requiring padding. (It can be made even better. Linker is too dumb). Step 2: adding "-ffunction-sections -fdata-sections" Previous optimization isn't working too well because data objects aren't living in separate sections, they are all grouped in one .data and one .bss section per *.o file. "-ffunction-sections -fdata-sections" fix this by putting every function and data object into its own section. Then section sorting eliminates many more padding gaps: text data bss dec filename 572068 1916 11576 585560 a.1/native-compiler/lib/libc.so 570356 1900 11480 583736 a.2/native-compiler/lib/libc.so More to it. Object files in static libc.a also have their functions and objects each in its own section. This means that programs linked with -Wl,--gc-sections (toybox and busybox do this) will be able to drop unused code and data not on per-.o-file basis, but on per-function and per-object basis, resulting in ~1% size decrease! text data bss dec filename 338047 6608 22384 367039 a.1/root-filesystem/usr/bin/toybox 336143 6560 22352 365055 a.2/root-filesystem/usr/bin/toybox text data bss dec filename 324711 862 7648 333221 a.1/root-filesystem/bin/busybox 321913 826 7520 330259 a.2/root-filesystem/bin/busybox Most programs, alas, don't use -Wl,--gc-sections, but they still get a tiny bit smaller: text data bss dec filename 1029977 8752 60192 1098921 a.1/native-compiler/bin/as 1029945 8720 60192 1098857 a.2/native-compiler/bin/as text data bss dec filename 1122513 9328 25120 1156961 a.1/native-compiler/bin/ld 1122513 9296 25120 1156929 a.2/native-compiler/bin/ld text data bss dec filename 425757 50652 16448 492857 a.1/native-compiler/bin/bash 425725 50604 16416 492745 a.2/native-compiler/bin/bash text data bss dec filename 140624 880 9472 150976 a.1/native-compiler/x86_64-unknown-linux/bin/collect2 140624 848 9440 150912 a.2/native-compiler/x86_64-unknown-linux/bin/collect2 I would say there is no reason to not do steps 1 and 2 always. They don't pessimize execution speed. They simply get rid of some data padding, and drop dead, unreachable code. Step 3: add "-falign-jumps=1 -falign-labels=1" Step 4: add "-falign-functions=1 -falign-loops=1" Not particularly interesting - they do reduce size of every program I measured, but some (many?) people would prefer to leave it to gcc to decide when and how align code, for speed reasons. Anyway, here are stats: -rw-r--r-- 1 root root 2514966 a.2/native-compiler/lib/libc.a -rw-r--r-- 1 root root 2514726 a.3/native-compiler/lib/libc.a -rw-r--r-- 1 root root 2514646 a.4/native-compiler/lib/libc.a text data bss dec filename 570356 1900 11480 583736 a.2/native-compiler/lib/libc.so 570148 1900 11480 583528 a.3/native-compiler/lib/libc.so 569637 1900 11480 583017 a.4/native-compiler/lib/libc.so text data bss dec filename 336143 6560 22352 365055 a.2/root-filesystem/usr/bin/toybox 335999 6560 22352 364911 a.3/root-filesystem/usr/bin/toybox 335743 6560 22352 364655 a.4/root-filesystem/usr/bin/toybox text data bss dec filename 321913 826 7520 330259 a.2/root-filesystem/bin/busybox 321801 826 7520 330147 a.3/root-filesystem/bin/busybox 321541 826 7520 329887 a.4/root-filesystem/bin/busybox text data bss dec filename 1029945 8720 60192 1098857 a.2/native-compiler/bin/as 1029817 8720 60192 1098729 a.3/native-compiler/bin/as 1029609 8720 60192 1098521 a.4/native-compiler/bin/as text data bss dec filename 1122513 9296 25120 1156929 a.2/native-compiler/bin/ld 1122369 9296 25120 1156785 a.3/native-compiler/bin/ld 1122161 9296 25120 1156577 a.4/native-compiler/bin/ld text data bss dec filename 425725 50604 16416 492745 a.2/native-compiler/bin/bash 425629 50604 16416 492649 a.3/native-compiler/bin/bash 425437 50604 16416 492457 a.4/native-compiler/bin/bash text data bss dec filename 140624 848 9440 150912 a.2/native-compiler/x86_64-unknown-linux/bin/collect2 140560 848 9440 150848 a.3/native-compiler/x86_64-unknown-linux/bin/collect2 140336 848 9440 150624 a.4/native-compiler/x86_64-unknown-linux/bin/collect2
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.