musl - upstreaming mallocng

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200510220915.GA17662@brightrain.aerifal.cx>
Date: Sun, 10 May 2020 18:09:15 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: upstreaming mallocng

Finishing initial development of and upstreaming mallocng is the main
item for this release cycle, and I think we're close to there. The
main question is whether there will be any serious performance or
memory usage regressions relative to oldmalloc, which may be
problematic for some users. With that in mind, I'm considering
supporting a build-time choice of oldmalloc for at least the first
couple releases with mallocng, so that there's an easy path back for
users who are hit by any such regression that can't be solved
immediately.

If offering oldmalloc as an option, I think I'd like to apply the
patch for the runaway heap expansion issue first:

https://www.openwall.com/lists/musl/2019/04/12/4

This does cost some level of performance, but the loss should be
well-understood. Arguably this should have been applied at the time I
wrote it, but I wasn't really excited about the performance hit
without having something better to offer in place of it.

With that said, here are some of the things users should expect to see
from mallocng:

- More predictable behavior. Availability of memory should be
  influenced a lot less by history of allocator activity within the
  process's lifetime, especially on 64-bit archs where virtual address
  space fragmentation is not an issue. Non-essential patterns of
  ballooning fragmentation, like in glibc issue
  https://sourceware.org/bugzilla/show_bug.cgi?id=14581 which musl
  oldmalloc also exhibited, should also not happen anymore.

- Memory getting returned to the system, aggressively unless it
  bounces back and forth enough that we back off to avoid
  pathologically bad performance. Software that builds then frees
  large data structures should expect to mostly return to usage levels
  from before the allocation after it's all freed, rather than getting
  stuck at near-peak usage like what happens with dlmalloc type
  allocator.

- No more runaway heap expansion (see above thread).

- Differences in performance characteristics. Expected to perform
  better at high usage, anywhere between noticably better and
  noticably worse at low usage levels. (Where worse is severe, efforts
  will be made to mitigate it as much as possible, but only if it's
  reported.)

- Lower memory overhead at high usage. Ideally similar, but possibly
  somewhat higher, memory overhead at low usage. Tuning out all of the
  causes of gratuitous excess usage and over-eagar preallocation when
  the nominal usage is low, without clobbering performance, has been
  the big focus of recent development.

- Byte-accurate malloc_usable_size. This is necessary to make accesses
  out to malloc_usable_size offset not produce UB in the eyes of the
  compiler/fortify, which may generate traps (e.g. for memset(p, 42,
  malloc_usable_size(p))).

Now would be a great time for feedback on large-scale use of mallocng,
if anyone's up for it. You can put it in LD_PRELOAD via .profile or
even in init for system-wide use.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.