|
Message-ID: <20200516002912.GN21576@brightrain.aerifal.cx> Date: Fri, 15 May 2020 20:29:13 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: mallocng progress and growth chart On Sun, May 10, 2020 at 02:09:34PM -0400, Rich Felker wrote: > 4668: 2x5440 2x5440 2x5440 2x5440 2x5440 5x4672 5x4672 5x4672 5x4672 5x4672 5x4672 7x4672 ... This turns out to be just about the worst edge case we have, and in a sense one that's fundamental. Sadly there are a number of applications, including bash, that do a lot of malloc(4096). The ones that just allocate and don't have any complex malloc/free patterns will see somewhat higher usage with mallocng, and I don't think there's any way around that. (Note: oldmalloc also has problems here under certain patterns of alloc/free, due to bin_index vs bin_index_up discrepancy!) I have some changes I'm about to push that help this somewhat. The 2x5440 count-reduction (this is 3x with proper-fit count) is overly costly at this size, and imposes a 12.5% waste on top of the slack from coarse size classing and the base slack from mapping 4096 into a 4672 size class. Getting rid of it, and accounting for existing coarse size class usage when doing the 7->5 reduction, produce: 4668: 3x5440 3x5440 3x5440 3x5440 5x4672 7x4672 ... which seems like about the best we can do. The initial allocation of 3x rather than 2x only uses one additional page to get an additional slot that can be used before needing to mmap again, which is a big win (essentially that third slot doesn't have any overhead) except in the case where it's never used, and only a small loss (1 page) even then. The same thing happens at the next doubling for malloc(8192), and the same mitigation applies. However with that: 9340: 3x10912 3x10912 3x10912 3x10912 3x9344 7x9344 ... the coarse size classing is dubious because the size is sufficiently large that a 7->3 count reduction can be used, with the same count the coarsse class would have, but with a 28k rather than 32k mapping. Unfortunately the decision here depends on knowing page size, which isn't constant at the point where it needs to be made. For integration with musl, page size is initially known even if it's variable, so we could possibly make a decision not to use coarse sizing based on that here, but standalone mallocng lacks that knowledge (page size isn't known until after first alloc_meta). This might could be reworked. There's a fairly small range of sizes that would benefit (larger ones are better off with individual mmap because page size quickly becomes "finer than" size classes), but the benefit seems fairly significant (not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at the sizes where it helps.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.