Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200405022023.GI11469@brightrain.aerifal.cx>
Date: Sat, 4 Apr 2020 22:20:23 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: New malloc tuning for low usage

On Sat, Apr 04, 2020 at 02:19:48PM -0400, Rich Felker wrote:
> On Fri, Apr 03, 2020 at 10:55:54PM -0400, Rich Felker wrote:
> > In working on this, I noticed that it looks like the coarse size class
> > threshold (6) in top-level malloc() is too low. At that threshold, the
> > first fine-grained-class group allocation will be roughly a 100%
> > increase in memory usage by the class; I'd rather keep the relative
> > increase bounded by 50% or less. It should probably be something more
> > like 10 or 12 to achieve this. With 12, repeated allocations of 16k
> > first produce 7 individual 20k mmaps, then a 3-slot class-37
> > (21824-byte slots) group, then a 7-slot class-36 (18704-byte slots)
> > group.
> > 
> > One thing that's not clear to me is whether it's useful at all to
> > produce the 3-slot class-37 group rather than just going on making
> > more individual mmaps until it's time to switch to the larger group.
> > It's easy to tune things to do the latter, and seems to offer more
> > flexibility in how memory is used. It also allows slightly more
> > fragmentation, but the number of such objects is highly bounded to
> > begin with because we use increasingly larger groups as usage goes up,
> > so the contribution should be asymptotically irrelevant.
> 
> The answer is that it depends on where the sizes fall. At 16k,
> rounding up to page size produces 20k usage (5 pages) but the 3-slot
> class-37 group uses 5+1/3 pages, so individual mmaps are preferable.
> However if we requested 20k, individual mmaps would be 24k (6 pages)
> while the 3-slot group would still just use 5+1/3 page, and would be
> preferable to switch to. The condition seems to be just whether the
> rounded-up-to-whole-pages request size is larger than the slot size,
> and we should prefer individual mmaps if (1) it's smaller than the
> slot size, or (2) using a multi-slot group would be a relative usage
> increase in the class of more than 50% (or whatever threshold it ends
> up being tuned to).
> 
> I'll see if I can put together a quick implementation of this and see
> how it works.

This seems to be working very well with the condition:

	if (sc >= 35 && cnt<=3 && (size*cnt > usage/2 || ((req+20+pagesize-1) & -pagesize) <= size))
	    ^^^^^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	 at least ~16k    wanted to make a smaller        requested size
	                  group but hit lower cnt         rounded up to
	                  limit; see loop above           page <= slot size

at the end of the else clause for if (sc < 8) in alloc_group. Here req
is a new argument to expose the size of the actual request malloc
made, so that for single-slot groups (mmap serviced allocations) we
can allocate just the minimum needed rather than the nominal slot
size.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.