Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240917121241.GQ32249@brightrain.aerifal.cx>
Date: Tue, 17 Sep 2024 08:12:41 -0400
From: Rich Felker <dalias@...c.org>
To: Lukas Zeller <luz@...n44.ch>
Cc: musl@...ts.openwall.com, alice <alice@...ya.dev>
Subject: Re: SIGSEGV/stack overflow in pthread_create - race condition?

On Tue, Sep 17, 2024 at 12:37:01PM +0200, Lukas Zeller wrote:
> Hello Rich,
> 
> bottom line: problem solved, you were right :-)
> 
> It was a use-after-free problem in libpagekite.
> 
> I created PRs on libpagekite [1] and openwrt packages [2] to fix it.
> 
> I was mostly mislead into "stack overflow" thinking by the gdb backtrace
> artifacts looking like __clone() recursion and using that red herring
> to search for other similar problems, some of which actually *were*
> stack overflow related.
> 
> Thanks for the help and sorry for wasting some of your time.

No problem!

> Details for the record:
> 
> > On 13 Sep 2024, at 21:54, Rich Felker <dalias@...c.org> wrote:
> > 
> > [...]
> > Can you dump the disassembly (disasm command) at the point of crash?
> > That will show what's attempting to be accessed and what "type of
> > segfault" it is.
> 
> Did that, eventually revealed the faulty access, which looks like this in C
> (pkproto.c:765):
> 
>   /* Cleanup */
>   free(copy);
> 
>   /* Pass judgement */
>   if ('\0' == *public_domain) return pk_err_null(ERR_PARSE_NO_KITENAME);
> 
> Unfortunately, public_domain was a pointer into the `copy` buffer freed
> on the line before. A very very short race window but apparently
> together with the fact that 16 of these threads were fired up in rapid
> succession it could obviously happen that the page got mapped away in
> between.

I don't think you were racing with the freeing (which would happen
deterministically in the same thread) but rather had a
nondeterministic order of allocations, where sometimes the freed
object would be the last one in its group (thereby resulting in free
unmapping it) and other times would have other live objects still in
the same allocation group (which would prevent unmapping it).

> Regarding the gdb bt artifact - before last night I had never thougt
> about how difficult stack backtracing gets with the ARM link register
> compared to traditional return-address-on-stack. And learned about
> ..cfi. Would it be possible to instrument clone.s with .cfi such
> that gdb would realize that __clone is the origin of a new thread
> on a new stack and there is nothing to backtrack beyond?
> Probably a puzzle for another time...

Yes, getting minimal annotation needed for the debugger to handle this
right is an open todo item. Ideally it would be contingent on whether
you enabled debugging so that it wouldn't end up in no-debug-info
builds. Slimming down the asm source files to the minimum that
actually have to be asm source files (like clone, I think), which is
also a wishlist item, would make it less of an invasive change, which
is probably why I've held off on digging into it.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.