Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121118210353.GA4844@brightrain.aerifal.cx>
Date: Sun, 18 Nov 2012 16:03:53 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: debloating data, bss

Hi all,

I've been doing a bit of checking for unneeded bloat, and here's what
I've found in data & bss:

   1045       0     288    1333     535 mntent.o (ex lib/libc.a)

This is simply a line buffer. It seems to me we could instead use
getline or fgetln, but for the latter some consideration is needed to
determine whether its semantics suffice.

    665       0      32     697     2b9 mq_notify.o (ex lib/libc.a)

This is purely gcc being stupid and putting static const char[32] into
bss rather than text (wasting 32 bytes of writable memory to save 32
bytes on disk). Since the buffer is junk, we could just use "char[32]"
(uninitialized); that would shrink the code but would result in
warnings. Or, we could use a pointer to any static object or code of
at least 32 bytes in size.

     86       0     544     630     276 gethostbyaddr.o (ex lib/libc.a)
     78       0     544     622     26e gethostbyname2.o (ex lib/libc.a)

Some ugly gigantic buffers for results. Since getaddrinfo requires
dynamic allocation anyway, it would be reasonable to dynamically
allocate these too; it would not be introducing a failure case that
did not already previously exist.

      6       0     512     518     206 res_state.o (ex lib/libc.a)

This is pure junk; it's just there to satisfy broken programs that try
to peek/poke at the resolver state. I wonder if we could make it
smaller without breaking anything.

    908     160      12    1080     438 random.o (ex lib/libc.a)

Unfortunately I think random really does have that much state...

     43       0     128     171      ab sigisemptyset.o (ex lib/libc.a)

This is another case of gcc stupidly putting uninitialized static
const in bss instead of text. I have a better workaround anyway
though; anyway this code needs to be fixed because it's comparing the
while 1024-bit bit-array even though we treat all but the first 64/128
bits as padding now.

    209       0    8192    8401    20d1 pthread_key_create.o (ex lib/libc.a)

I'm considering replacing pthread_key_create with a new implementation
that makes a fake DSO with TLS instead of having the pthread
thread-specific data being part of the main thread block.

Aside from this, the main issue that's making libc.so's dirty-page
cost so high is that the bss isn't sorted; most of bss is unused in
most programs, but because the commonly-used stuff isn't grouped
together, several pages end up dirty. With this in mind, I'm
considering one of the following 3 approaches to get the commonly-used
data all together in one page:

1. Explicitly initialize everything that's always-used, so it ends up
in .data rather than .bss, and thus on the first page.

2. Reorder object files in the linking so that the bloated junk is all
at the end.

3. Find a way to get the linker to sort it for us, possibly with
alignment and alignment-based sorting.

With the above changes, I think we should be able to cut 2-3 pages of
commit charged off of libc.so and drop the minimum dirty pages for
dynamic linking from 20k (5 pages) to 12k (3 pages, only one of which
is in libc.so; the others are the main app's data and stack).

Major work on debloating will probably not begin until after the next
release.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.