|
Message-ID: <20181207202114.GI23599@brightrain.aerifal.cx> Date: Fri, 7 Dec 2018 15:21:14 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: aio_cancel segmentation fault for in progress write requests On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote: > Okay, it's a race of some kind: > > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so > musl libc (powerpc64) > Version 1.1.20-git-156-gb1c58cb9 > Dynamic Program Loader > Usage: lib/libc.so [options] [--] pathname [args] > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > aio_write/1-1.c cancelationStatus : 2 > Test PASSED > awilcox on gwyn [pts/7 Fri 7 13:12] musl: lib/libc.so ~/aioWrite > zsh: segmentation fault lib/libc.so ~/aioWrite > > > So, my best theory is that running inside a debugger (gdb, valgrind) > makes it slow enough that it no longer races. OK, here's a theory. Based on my reply just now to Florian, the signal context would have to get really big to make the expected code path overflow -- io_thread_func() has a very small stack frame and so does cleanup(). However, early in io_thread_func, it calls __aio_get_queue(), which calls calloc() if the tables at each level don't already exist, which is certainly the case for the first call. During this call, the margin will be somewhat smaller, and maybe it's enough to make kernels that break the MINSIGSTKSZ contract cause an overflow. The right action here is probably calling __aio_get_queue with the fd number *before* calling pthread_create, so that it's guaranteed that __aio_get_queue takes the fast path in the io thread and doesn't call calloc. This is especially important in light of the newish allowance that malloc be interposed, where we would be running application-provided malloc code in a thread with tiny stack. I'm still not sure this is the source of the reported crash but I think it needs to be changed either way. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.