|
Message-ID: <20230125055323.GK4163@brightrain.aerifal.cx> Date: Wed, 25 Jan 2023 00:53:23 -0500 From: Rich Felker <dalias@...c.org> To: Dominique MARTINET <dominique.martinet@...ark-techno.com> Cc: musl@...ts.openwall.com Subject: Re: infinite loop in mallocng's try_avail On Wed, Jan 25, 2023 at 09:33:52AM +0900, Dominique MARTINET wrote: > > If this code is being reached, either the allocator state has been > > corrupted by some UB in the application, or there's a logic bug in > > mallocng. The sequence of events that seem to have to happen to get > > there are: > > > > 1. Previously active group has no more available slots (line 120). > > Right, that one has already likely been dequeued (or at least > traversed), so I do not see how to look at it but that sounds possible. > > > 2. Freed mask of newly activating group (line 131 or 138) is either > > zero (line 145) or the active_idx (read from in-band memory > > susceptible to application buffer overflows etc) is wrong and > > produces zero when its bits are anded with the freed mask (line > > 145). > > m->freed_mask looks like it is zero from values below; I cannot tell if > that comes from a corruption outside of musl or not. > > > > (gdb) p __malloc_context > > > $94 = { > > > secret = 15756413639004407235, > > > init_done = 1, > > > mmap_counter = 135, > > > free_meta_head = 0x0, > > > avail_meta = 0x18a3f70, > > > avail_meta_count = 6, > > > avail_meta_area_count = 0, > > > meta_alloc_shift = 0, > > > meta_area_head = 0x18a3000, > > > meta_area_tail = 0x18a3000, > > > avail_meta_areas = 0x18a4000 <error: Cannot access memory at address 0x18a4000>, > > > active = {0x18a3e98, 0x18a3eb0, 0x18a3208, 0x18a3280, 0x0, 0x0, 0x0, 0x18a31c0, 0x0, 0x0, 0x0, 0x18a3148, 0x0, 0x0, 0x0, 0x18a3dd8, 0x0, 0x0, 0x0, 0x18a3d90, 0x0, > > > 0x18a31f0, 0x0, 0x18a3b68, 0x0, 0x18a3f28, 0x0, 0x0, 0x0, 0x18a3238, 0x0 <repeats 18 times>}, > > > usage_by_class = {2580, 600, 10, 7, 0 <repeats 11 times>, 96, 0, 0, 0, 20, 0, 3, 0, 8, 0, 3, 0, 0, 0, 3, 0 <repeats 18 times>}, > > > unmap_seq = '\000' <repeats 31 times>, > > > bounces = '\000' <repeats 18 times>, "w", '\000' <repeats 12 times>, > > > seq = 1 '\001', > > > brk = 25837568 > > > } > > > (gdb) p *__malloc_context->active[0] > > > $95 = { > > > prev = 0x18a3f40, > > > next = 0x18a3e80, > > > mem = 0xb6f57b30, > > > avail_mask = 1073741822, > > > freed_mask = 0, > > > last_idx = 29, > > > freeable = 1, > > > sizeclass = 0, > > > maplen = 0 > > > } > > > (gdb) p *__malloc_context->active[0]->mem > > > $97 = { > > > meta = 0x18a3e98, > > > active_idx = 29 '\035', > > > pad = "\000\000\000\000\000\000\000\000\377\000", > > > storage = 0xb6f57b40 "" > > > } > > > > This is really weird, because at the point of the infinite loop, the > > new group should not yet be activated (line 163), so > > __malloc_context->active[0] should still point to the old active > > group. But its avail_mask has all bits set and active_idx is not > > corrupted, so try_avail should just have obtained an available slot > > from it without ever entering the block at line 120. So I'm confused > > how it got to the loop. > > try_avail's pm is `__malloc_context->active[0]`, which is overwritten by > either dequeue(pm, m) or *pm = m (lines 123,128), so the original > m->avail_mask could have been zero, with the next element having a zero > freed mask? No, avail_mask is only supposed to be able to be nonzero after activate_group, which is only called on the head of an active list (free.c:86 or malloc.c:163) and which atomically pulls bits off freed_mask to move them to avail_mask. If we're observing avail_mask nonzero at the point you saw it, some invariant seems to have been violated. > > One odd thing I noticed is that the backtrace pm=0xb6f692e8 does not > > match the __malloc_context->active[0] address. Were thse from > > different runs? > > These were from the same run, I've only observed this single occurence > first-hand. > > pm is &__malloc_context->active[0], so it's not 0x18a3e98 (first value > of active) but its address (e.g. __malloc_context+48 as per gdb symbol > resolution in the backtrace) > I didn't print __malloc_context but I don't see why gdb would have > gotten that wrong. Ah, I forgot I was looking at an additional level of indirection here. It would be nice to know if m is the same active[0] as at entry; that would help figure out where things went wrong... Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.