musl - Re: malloc implementation survey: omalloc

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180731142507.GF1392@brightrain.aerifal.cx>
Date: Tue, 31 Jul 2018 10:25:07 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: malloc implementation survey: omalloc

On Tue, Jul 31, 2018 at 09:49:30AM +0200, Markus Wichmann wrote:
> > It ran well on everything from a arm cortex-m0 to an intel core i7. The
> > code compiled to 504 .text bytes, on cortex-m0, iirc.
> > 
> > I wrote it originally for using on the kernel-side of an rtos, but it could
> > easily be extended to make a syscall when a process runs out of ram.
> > 
> > Obviously, a shortcoming is that the memory blocks must be PoT and there is
> > the potential for fragmentation. Otherwise though, the meta-information is
> > intrinsic within the pointer, which obliviates the need for external
> > storage.
> > 
> 
> That does sound interesting, but I don't really follow. The meta-info
> which we need to save is the length of the object returned. How can you
> encode that in the pointer without any external storage? Using internal
> storage? Yeah, that is where we are right now, and that means there's
> only one buffer overflow between us and oblivion at any given time.

This isn't the case with a minimally reasonable design, including
musl's current one. The chunk header and footer must match; if they
don't, it's proof of UB and the allocator terminates the program. The
current implementation is not hardened against replacing real headers
with apparently valid, matching ones, but this hardening can be
achieved by xor'ing them with a canary. One could even make an array
of canaries, indexed by some bits or a hash of the pointer, so that
stealing the canary for one chunk doesn't let you forge another one.

In many ways, internal storage of metadata is *more hardened* because
the allocator is able to catch overflows this way. With no internal
data for the allocator to consistency-check, attackers can just target
application data in adjacent chunks and don't have to worry that the
allocator will catch the attack and terminate.

> What I am getting at is this: We can't just benchmark a few algorithms
> and stick with the fastest one. Nor is a simple algorithmic analysis
> enough, since often enough, simple lists can be replaced with more
> complicated containers yielding better performance for some loads. It's
> complicated...

Because fast and not wasting memory or producing catastrophic
fragmentation are usually mutually exclusive. Eliminating runaway
waste and fragmentation without a major performance regression is the
primary goal. Being faster is a secondary goal.

I know the "main source" of bad behavior in the current design can be
eliminated by getting rid of the fine-grained locks and switching to a
global lock. It can probably also be solved by holding multiple locks
at once and a complex lock order protocol, but it's not clear that
wouldn't be slower than just using a global lock. Major new direction
in design has a better chance of finding something where locality of
access needed for allocator operations is not a significant tradeoff
with fragmentation or over-allocation.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.