|
Message-ID: <20180530005009.GM1392@brightrain.aerifal.cx> Date: Tue, 29 May 2018 20:50:09 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Re: pthread cancel cleanup and pthread_mutex_lock On Wed, May 30, 2018 at 10:06:17AM +1000, Patrick Oppenlander wrote: > I accidentally hit send before I finished typing.. > > > I've recently been running some of the open posix testsuite tests from > > the linux test project. > > > > One particular test has been giving me headaches: > > https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_mutex_init/1-2.c > > > > There are a couple of different tests in there but the most > > interesting one is the deadlock test which does the following: > > > > Thread A: Thread B: > > pthread_create > > pthread_cleanup_push(...) > > pthread_mutex_lock(M) > > pthread_setcanceltype(ASYNC) > > pthread_setcancelstate(ENABLE) > pthread_mutex_lock(M) <-- blocks here > pthread_cancel(B) > pthread_join(B) > > The test then expects the cleanup handler to run and unlock mutex M > allowing thread B to run to completion and the join to succeed. This test is invalid. pthread_mutex_lock is not async-cancel-safe and cannot legally be called while cancel type is async. FYI something like 50% of the "Open POSIX Test Suite" tests are invalid; in the majority of cases they're testing some property after undefined behavior has been invoked like here. > I've run this test with musl, glibc and on some different platforms > with varying results: > > x86_64 linux 4.16.11, glibc: test runs to completion > x86_64 linux 4.16.11, musl: deadlock (cleanup handler doesn't run) > arm linux 4.16.5, musl: test runs to completion The test is invalid in other ways too, involving races. It attempts to use sched_yield to ensure that the test thread enters pthread_mutex_lock a second time, but there's no reason to expect that to do anything, especially if there are sufficiently many cores (as many or more than running threads). I suspect the different behaviors come down to just different scheduling properties due to performance differences, or something like that. Naively, I would expect the test to "work" despite being invalid. > I'm not even sure that this test is valid -- I can't find any > documentation which says that pthread_mutex_lock is a cancellation > point, or that you're allowed to call pthread_mutex_unlock from an > async cancel handler. You can call anything you want from an async cancel handler, but you can't call any libc functions except the ones controlling cancel state while cancel type is async. Basically, all you can do in async cancel state is pure computation. > However, it's still concerning to see different results on different platforms. > > What's the expected behaviour here? Nothing meaningful. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.