Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230125055323.GK4163@brightrain.aerifal.cx>
Date: Wed, 25 Jan 2023 00:53:23 -0500
From: Rich Felker <dalias@...c.org>
To: Dominique MARTINET <dominique.martinet@...ark-techno.com>
Cc: musl@...ts.openwall.com
Subject: Re: infinite loop in mallocng's try_avail

On Wed, Jan 25, 2023 at 09:33:52AM +0900, Dominique MARTINET wrote:
> > If this code is being reached, either the allocator state has been
> > corrupted by some UB in the application, or there's a logic bug in
> > mallocng. The sequence of events that seem to have to happen to get
> > there are:
> > 
> > 1. Previously active group has no more available slots (line 120).
> 
> Right, that one has already likely been dequeued (or at least
> traversed), so I do not see how to look at it but that sounds possible.
> 
> > 2. Freed mask of newly activating group (line 131 or 138) is either
> >    zero (line 145) or the active_idx (read from in-band memory
> >    susceptible to application buffer overflows etc) is wrong and
> >    produces zero when its bits are anded with the freed mask (line
> >    145).
> 
> m->freed_mask looks like it is zero from values below; I cannot tell if
> that comes from a corruption outside of musl or not.
> 
> > > (gdb) p __malloc_context            
> > > $94 = {
> > >   secret = 15756413639004407235,
> > >   init_done = 1,
> > >   mmap_counter = 135,
> > >   free_meta_head = 0x0,
> > >   avail_meta = 0x18a3f70,
> > >   avail_meta_count = 6,
> > >   avail_meta_area_count = 0,
> > >   meta_alloc_shift = 0,
> > >   meta_area_head = 0x18a3000,
> > >   meta_area_tail = 0x18a3000,
> > >   avail_meta_areas = 0x18a4000 <error: Cannot access memory at address 0x18a4000>,
> > >   active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, 
> > >     0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 <repeats 18 times>},
> > >   usage_by_class = {2580, 600, 10, 7, 0 <repeats 11 times>, 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 <repeats 18 times>},
> > >   unmap_seq = '\000' <repeats 31 times>,
> > >   bounces = '\000' <repeats 18 times>, "w", '\000' <repeats 12 times>,
> > >   seq = 1 '\001',
> > >   brk = 25837568
> > > }
> > > (gdb) p *__malloc_context->active[0]
> > > $95 = {
> > >   prev = 0x18a3f40,
> > >   next = 0x18a3e80,
> > >   mem = 0xb6f57b30,
> > >   avail_mask = 1073741822,
> > >   freed_mask = 0,
> > >   last_idx = 29,
> > >   freeable = 1,
> > >   sizeclass = 0,
> > >   maplen = 0
> > > }
> > > (gdb) p *__malloc_context->active[0]->mem
> > > $97 = {
> > >   meta = 0x18a3e98,
> > >   active_idx = 29 '\035',
> > >   pad = "\000\000\000\000\000\000\000\000\377\000",
> > >   storage = 0xb6f57b40 ""
> > > }
> > 
> > This is really weird, because at the point of the infinite loop, the
> > new group should not yet be activated (line 163), so
> > __malloc_context->active[0] should still point to the old active
> > group. But its avail_mask has all bits set and active_idx is not
> > corrupted, so try_avail should just have obtained an available slot
> > from it without ever entering the block at line 120. So I'm confused
> > how it got to the loop.
> 
> try_avail's pm is `__malloc_context->active[0]`, which is overwritten by
> either dequeue(pm, m) or *pm = m (lines 123,128), so the original
> m->avail_mask could have been zero, with the next element having a zero
> freed mask?

No, avail_mask is only supposed to be able to be nonzero after
activate_group, which is only called on the head of an active list
(free.c:86 or malloc.c:163) and which atomically pulls bits off
freed_mask to move them to avail_mask. If we're observing avail_mask
nonzero at the point you saw it, some invariant seems to have been
violated.

> > One odd thing I noticed is that the backtrace pm=0xb6f692e8 does not
> > match the __malloc_context->active[0] address. Were thse from
> > different runs?
> 
> These were from the same run, I've only observed this single occurence
> first-hand.
> 
> pm is &__malloc_context->active[0], so it's not 0x18a3e98 (first value
> of active) but its address (e.g. __malloc_context+48 as per gdb symbol
> resolution in the backtrace)
> I didn't print __malloc_context but I don't see why gdb would have
> gotten that wrong.

Ah, I forgot I was looking at an additional level of indirection here.
It would be nice to know if m is the same active[0] as at entry; that
would help figure out where things went wrong...

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.