|
Message-ID: <20191214151932.GW1666@brightrain.aerifal.cx> Date: Sat, 14 Dec 2019 10:19:32 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: max_align_t mess on i386 In reserching how much memory could be saved, and how practical it would be, for the new malloc to align only to 8-byte boundaries instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty much all 32-bit archs), I discovered that GCC quietly changed its idead of i386 max_align_t to 16-byte alignment in GCC 7, to better accommodate the new _Float128 access via SSE. Presumably (I haven't checked) the change is reflected with changes in the psABI document to make it "official". This is a somewhat ABI-breaking change (for example it would break ABI stuct layout for any 3rd party library struct using max_align_t to align part of a public type), but GCC folks seem to have done the research at the time to indicate there wasn't anything affected in practice in known published code. The big question now is: should we change musl's i386 max_align_t to follow? One of the advantages of not using compiler-provided headers is that we don't get this kind of silent ABI change happening out from under us, or have ABI depend on whether you used GC <=6 vs GCC >=7 to compile (which is a rather awful property). But it also means we have to make conscious decisions about following. I was thinking of trying to make this decision in the next release cycle (1.2.1) along with merging new malloc, so that we don't potentially have a single release that drops i386 to 8-byte alignment followed by one increasing it right back, and making further combinatoric compat problems. But I realized just now that with time64 already being a hit to ABI-compat between pairs of libc consumers, changing max_align_t at the same time, if we're going to do it, would probably be better. FWIW I think this change is *far* less impactful than time64 in terms of compate. The disadvantage of changing max_align_t is that we shut out the possibility of using 8-byte alloction granularity (on i386), which looks like it could save something like 10-15% memory usage in typical programs with small allocated objects (see also: Causes of Bloat, Limits of Health paper[1]), but even up to 33% where the choice is between 24 and 32 byte allocated slots for a 13-20 byte structure or string (note: average tends to be half of max if requested sizes are uniform, but at such small sizes they may tend to be non-uniform). However, whatever we do with i386, the option of using 8-byte granularity remains open for all the other 32-bit archs, most of which tend to be used with machines far more memory-constrained than i386. The disadvantage of leaving max_align_t alone is that we have to (continue to) consider _Float128 an unsupported extension type whose use would be outside the scope of any guarantees we make, and that would need memalign to use. This is largely viable at present because it's a fringe thing, but we don't know if that will continue to be true far in the future. I'm somewhat leaning towards following the ABI change, because (1) we have a good opportunity to do it now that we won't get again, and (2) I'm worried we'll eventually get into a mess by not doing it. But I want to offer the opportunity for discussion before finalizing anything, especially in case there are considerations I'm missing. Rich [1] https://ftp.barfooze.de/pub/sabotage/tmp/oopsla2007-bloat.pdf
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.