Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220919200744.GW9709@brightrain.aerifal.cx>
Date: Mon, 19 Sep 2022 16:07:45 -0400
From: Rich Felker <dalias@...c.org>
To: baiyang <baiyang@...il.com>
Cc: musl <musl@...ts.openwall.com>
Subject: Re: Re: The heap memory performance (malloc/free/realloc) is
 significantly degraded in musl 1.2 (compared to 1.1)

On Tue, Sep 20, 2022 at 03:45:35AM +0800, baiyang wrote:
> > The only correct value malloc_usable_size can return is the value
> > you passed to the allocator.
> 
> I don't think so, see:
> 
> Linux man page:
> https://man7.org/linux/man-pages/man3/malloc_usable_size.3.html -
> "The value returned by malloc_usable_size() may be **greater than**
> the requested size of the allocation".
> 
> Mac OS X man page:
> https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc_size.3.html
> - "The memory block size is always at least as large as the
> allocation it backs, **and may be larger**."
> 
> FreeBSD man page:
> https://www.freebsd.org/cgi/man.cgi?query=malloc_usable_size&apropos=0&sektion=0&manpath=FreeBSD+7.1-RELEASE&format=html
> - "The return value **may be larger** than the size that was
> requested during allocation".
> 
> These official man pages clearly state that the return value of
> malloc_usable_size is the size of the memory block allocated
> internally, not the size submitted by the user.
> 
> Instead, we didn't find any documentation saying that the return
> value of malloc_usable_size must be the size submitted by the user
> to be correct. Please correct me if you have the relevant
> documentation.

OK, I didn't state that precisely. There are two conflicting claims
for what the malloc_usable_size contract is. If it's allowed to return
some value larger than the size you requested, then the size returned
is not actually "usable" and there's basically nothing useful you can
do with the function.

> > It's sounding more and more like you did premature optimization
> > without measuring any of this, since there is *no way* the possible
> > amount of excess copying a realloc implementation might make
> > internally could cost more than an extra external function call to
> > malloc_usable_size (even if it did nothing but return).
> 
> As I said before:
> > We have a real scenario where `malloc_usable_size` is called
> > frequently: we need to optimize the realloc experience. We add an
> > extra parameter to realloc - minimalCopyBytes: it represents the
> > actual size of data that needs to be copied after fallback to
> > malloc-copy-free mode. We will judge whether to call realloc or
> > complete malloc-memcpy-free by ourself based on factors such as
> > the size of the data that realloc needs to copy (obtained through
> > `malloc_usable_size`), the size that we actually need to copy when
> > we doing malloc-memcpy-free ourself (minimalCopyBytes) and the
> > chance of merging chunks (small blocks) or mremap (large blocks)
> > in the underlayer realloc. So, this is a real scenario, we need to
> > call `malloc_usable_size` frequently.
> 
> Example: We allocate a block of 500KB (malloc actually allocated
> 512KB) and want to extend it to 576KB via realloc. At this point
> realloc may downgrade back to the inefficient malloc(756KB),
> memcpy(512KB) and free(512KB) modes.

Clearly you did not measure this, because with basically any
real-world malloc, it will call mremap and move the memory via
MMU-level remapping, with no copying involved whatsoever.

> But the real situation at this
> time may be that we only need to keep the first 4KB of content in
> 500KB, so we comprehensively evaluate the cost (including the
> possibility of realloc using block merging like in musl 1.1, and
> techniques like mremap to avoid copying) to decide whether to
> complete malloc(576KB), memcpy(**4KB**), free(512KB) by ourselves
> are more cost-effective.

You could have achieved exactly the same thing by keeping your own
knowledge that you allocated 500kB. But it would still be
significantly slower, because mmap+memcpy+munmap (2 syscalls) is
slower than mremap (1 syscall).

> Such optimizations have measurable and significant effects on our
> practical applications in each of the above libc environments.
> 
> In this scenario, we need to get the 512KB actually allocated by
> malloc through malloc_usable_size instead of the 500KB length we
> saved ourselves.

No you don't. Either number works just as well (or rather just as
poorly).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.