Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2022092003453382350548@gmail.com>
Date: Tue, 20 Sep 2022 03:45:35 +0800
From: baiyang <baiyang@...il.com>
To: "Rich Felker" <dalias@...c.org>
Cc: musl <musl@...ts.openwall.com>
Subject: Re: Re: The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1)

> The only correct value malloc_usable_size can return is the value you passed to the allocator. 

I don't think so, see:

Linux man page: https://man7.org/linux/man-pages/man3/malloc_usable_size.3.html - "The value returned by malloc_usable_size() may be **greater than** the requested size of the allocation".

Mac OS X man page: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/malloc_size.3.html - "The memory block size is always at least as large as the allocation it backs, **and may be larger**."

FreeBSD man page: https://www.freebsd.org/cgi/man.cgi?query=malloc_usable_size&apropos=0&sektion=0&manpath=FreeBSD+7.1-RELEASE&format=html - "The return value **may be larger** than the size that was requested during allocation".

These official man pages clearly state that the return value of malloc_usable_size is the size of the memory block allocated internally, not the size submitted by the user. 

Instead, we didn't find any documentation saying that the return value of malloc_usable_size must be the size submitted by the user to be correct. Please correct me if you have the relevant documentation.

> It's sounding more and more like you did premature optimization
> without measuring any of this, since there is *no way* the possible
> amount of excess copying a realloc implementation might make
> internally could cost more than an extra external function call to
> malloc_usable_size (even if it did nothing but return).

As I said before:
> We have a real scenario where `malloc_usable_size` is called frequently: we need to optimize the realloc experience. We add an extra parameter to realloc - minimalCopyBytes: it represents the actual size of data that needs to be copied after fallback to malloc-copy-free mode. We will judge whether to call realloc or complete malloc-memcpy-free by ourself based on factors such as the size of the data that realloc needs to copy (obtained through `malloc_usable_size`), the size that we actually need to copy when we doing malloc-memcpy-free ourself (minimalCopyBytes) and the chance of merging chunks (small blocks) or mremap (large blocks) in the underlayer realloc. So, this is a real scenario, we need to call `malloc_usable_size` frequently.

Example: We allocate a block of 500KB (malloc actually allocated 512KB) and want to extend it to 576KB via realloc. At this point realloc may downgrade back to the inefficient malloc(756KB), memcpy(512KB) and free(512KB) modes. But the real situation at this time may be that we only need to keep the first 4KB of content in 500KB, so we comprehensively evaluate the cost (including the possibility of realloc using block merging like in musl 1.1, and techniques like mremap to avoid copying) to decide whether to complete malloc(576KB), memcpy(**4KB**), free(512KB) by ourselves are more cost-effective.

Such optimizations have measurable and significant effects on our practical applications in each of the above libc environments.

In this scenario, we need to get the 512KB actually allocated by malloc through malloc_usable_size instead of the 500KB length we saved ourselves.

Thanks :-)

--

   Best Regards
  BaiYang
  baiyang@...il.com
  http://i.baiy.cn
**** < END OF EMAIL > **** 
 
 
From: Rich Felker
Date: 2022-09-20 03:18
To: baiyang
CC: musl
Subject: Re: Re: [musl] The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1)
On Tue, Sep 20, 2022 at 02:44:58AM +0800, baiyang wrote:
> > Is there a reason you're relying on an unreliable and nonstandard
> > function (malloc_usable_size) to do this rather than your program
> > keeping track of its own knowledge of the allocated size? This is what
> > the C language expects you to do. For example if you have a structure
> > that contains a pointer to a dynamically sized buffer, normally you
> > store the size in a size_t member right next to that pointer, allowing
> > you to make these kind of decisions without having to probe anything.
> 
> Yes, as I have been said, by comparing the number of bytes that
> realloc needs to copy in the worst case (the return value of
> malloc_usable_size), and the number of bytes we actually need to
> copy, we can optimize the performance of realloc in real scenarios
> and avoid unnecessary memory copies.
 
You can do exactly the same keeping track of the size yourself. The
only correct value malloc_usable_size can return is the value you
passed to the allocator. If it's returning a different size, your
malloc implementation has a problem that will make you commit UB when
you use the result of malloc_usable_size. Many real-world ones do have
this problem.
 
> In fact, in scenarios including glibc, tcmalloc, windows crt, mac os
> x, uclibc and musl 1.1, we did achieve good optimization results.
> 
> On the other hand, of course we keep the number of bytes actually
> allocated, but it doesn't really reflect objectively the number of
> bytes to be copied by realloc when the memcpy actually occurs. And
> malloc_usable_size() more accurately reflects how many bytes realloc
> needs to copy when it degenerates back to malloc-memcpy-free mode.
 
I don't understand your claim here. The size you would store is
exactly the size that realloc would have to copy. It if copies more,
that's just realloc being inefficient, but the difference is not going
to be material anyway.
 
> So our expectation is as mentioned in the man page for linux, mac os
> or windows: "The value returned by malloc_usable_size() may be
> **greater than** the requested size of the allocation" or "The
> memory block size is always at least as large as the allocation it
> backs, **and may be larger**." - We expect to get its internal size
> to evaluate the cost of memory copying.
 
It's sounding more and more like you did premature optimization
without measuring any of this, since there is *no way* the possible
amount of excess copying a realloc implementation might make
internally could cost more than an extra external function call to
malloc_usable_size (even if it did nothing but return).
 
Rich

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.