musl - max_align_t mess on i386

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191214151932.GW1666@brightrain.aerifal.cx>
Date: Sat, 14 Dec 2019 10:19:32 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: max_align_t mess on i386

In reserching how much memory could be saved, and how practical it
would be, for the new malloc to align only to 8-byte boundaries
instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
much all 32-bit archs), I discovered that GCC quietly changed its
idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
accommodate the new _Float128 access via SSE. Presumably (I haven't
checked) the change is reflected with changes in the psABI document to
make it "official".

This is a somewhat ABI-breaking change (for example it would break ABI
stuct layout for any 3rd party library struct using max_align_t to
align part of a public type), but GCC folks seem to have done the
research at the time to indicate there wasn't anything affected in
practice in known published code.

The big question now is: should we change musl's i386 max_align_t to
follow? One of the advantages of not using compiler-provided headers
is that we don't get this kind of silent ABI change happening out from
under us, or have ABI depend on whether you used GC <=6 vs GCC >=7 to
compile (which is a rather awful property). But it also means we have
to make conscious decisions about following.

I was thinking of trying to make this decision in the next release
cycle (1.2.1) along with merging new malloc, so that we don't
potentially have a single release that drops i386 to 8-byte alignment
followed by one increasing it right back, and making further
combinatoric compat problems. But I realized just now that with time64
already being a hit to ABI-compat between pairs of libc consumers,
changing max_align_t at the same time, if we're going to do it, would
probably be better. FWIW I think this change is *far* less impactful
than time64 in terms of compate.

The disadvantage of changing max_align_t is that we shut out the
possibility of using 8-byte alloction granularity (on i386), which
looks like it could save something like 10-15% memory usage in typical
programs with small allocated objects (see also: Causes of Bloat,
Limits of Health paper[1]), but even up to 33% where the choice is
between 24 and 32 byte allocated slots for a 13-20 byte structure or
string (note: average tends to be half of max if requested sizes are
uniform, but at such small sizes they may tend to be non-uniform).
However, whatever we do with i386, the option of using 8-byte
granularity remains open for all the other 32-bit archs, most of which
tend to be used with machines far more memory-constrained than i386.

The disadvantage of leaving max_align_t alone is that we have to
(continue to) consider _Float128 an unsupported extension type whose
use would be outside the scope of any guarantees we make, and that
would need memalign to use. This is largely viable at present because
it's a fringe thing, but we don't know if that will continue to be
true far in the future.

I'm somewhat leaning towards following the ABI change, because (1) we
have a good opportunity to do it now that we won't get again, and (2)
I'm worried we'll eventually get into a mess by not doing it. But I
want to offer the opportunity for discussion before finalizing
anything, especially in case there are considerations I'm missing.

Rich


[1] https://ftp.barfooze.de/pub/sabotage/tmp/oopsla2007-bloat.pdf
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.