musl - Re: aio_cancel segmentation fault for in progress write requests

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181207202114.GI23599@brightrain.aerifal.cx>
Date: Fri, 7 Dec 2018 15:21:14 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: aio_cancel segmentation fault for in progress write
 requests

On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> Okay, it's a race of some kind:
> 
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so
> musl libc (powerpc64)
> Version 1.1.20-git-156-gb1c58cb9
> Dynamic Program Loader
> Usage: lib/libc.so [options] [--] pathname [args]
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> 
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> aio_write/1-1.c cancelationStatus : 2
> Test PASSED
> awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite
> zsh: segmentation fault  lib/libc.so ~/aioWrite
> 
> 
> So, my best theory is that running inside a debugger (gdb, valgrind)
> makes it slow enough that it no longer races.

OK, here's a theory. Based on my reply just now to Florian, the signal
context would have to get really big to make the expected code path
overflow -- io_thread_func() has a very small stack frame and so does
cleanup(). However, early in io_thread_func, it calls
__aio_get_queue(), which calls calloc() if the tables at each level
don't already exist, which is certainly the case for the first call.
During this call, the margin will be somewhat smaller, and maybe it's
enough to make kernels that break the MINSIGSTKSZ contract cause an
overflow.

The right action here is probably calling __aio_get_queue with the fd
number *before* calling pthread_create, so that it's guaranteed that
__aio_get_queue takes the fast path in the io thread and doesn't call
calloc. This is especially important in light of the newish allowance
that malloc be interposed, where we would be running
application-provided malloc code in a thread with tiny stack.

I'm still not sure this is the source of the reported crash but I
think it needs to be changed either way.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.