|
Message-ID: <20140806093529.GT1674@brightrain.aerifal.cx> Date: Wed, 6 Aug 2014 05:35:29 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: PATCH: don't call cleanup handlers after a regular return from the thread start function On Wed, Aug 06, 2014 at 09:15:26AM +0200, Jens Gustedt wrote: > > > > - In the case of longjmp, it's already dangerous to longjmp across > > > > arbitrary code. Unless you know what code you're jumping over is > > > > doing or have a contract that it's valid to jump over it, you can't > > > > use longjmp. > > > > > > sure, but just consider the nice tool-library that implements > > > try-catch sort of things with setjmp/longjmp. > > > > It's not possible to mix this with cancellation (or with C++ > > exceptions, for that matter). But as long as neither "unwinds" across > > the other, you're fine. This is a restriction you need to follow and > > be aware of anyway. It's not something we can "work around" except by > > using the same underlying implementation for all three, and that's > > undesirable for various reasons (one of which is that people should > > not be relying on it). > > > > Even if you did want to support this, making it work is a lot harder > > than allowing longjmp out of pthread_once/call_once. The latter are > > special cases where there's at most one cancellation cleanup layer and > > it's implementation-internal. The general case has arbitrary > > cancellation cleanup buffers at different call frame contexts, and > > they're most certainly not static. > > Perhaps I wouldn't like to support this, but make the library more > robust against such issues as said above. Give diagnostics ("cleanup > handler stack not empty after thread termination") and ease debugging. > > If, as we seem both to intend, all this ends up to be clearly UB, > nothing prevents us from being nice if this happens. What do you think > of my initial patch (which is not very intrusive), but instead of > clearing `cancelbuf` to call abort? Your initial patch only covered the case where the illegal return is from the start function itself, which I think is probably the least likely to happen in practice. It doesn't cover the case where it happens from other functions. If this is deemed useful to catch and translate into a predictable crash, I would prefer a check for the cancellation buffer pointer against the stack pointer, crashing (via a_crash() like other code that does light checks for UB such as the code in malloc) if the cancellation buffer is below the stack pointer. However some consideration is needed (like: do we want to preclude cancellation from crossing signal handler boundaries when it's done "safely", e.g. by pushing a cleanup handler before unblocking signals, when the signal runs on an alternate signal stack?) so I'm a bit hesitant to add such checks that might not always be valid. Of course split-stack would also invalidate this strategy, but I think split-stack is already hopelessly broken anyway (there's a good past mailing list thread on the topic of split-stack and why it's a solution in search of a problem :). BTW one "safety" we currently have is that pthread_cleanup_pop does not just "pop one cleanup context"; it resets the context to the one that was in effect at the time the corresponding push was called, potentially popping multiple contexts if one or more pop was skipped via illegal returns or similar. In principle a check-and-crash could be added asserting that self->cancelbuf == cb before doing this, but I'm mildly hesitant to add conditionals to something that should be a fast-path. > > > (There is even less interaction between calls to call_once with > > > different objects, than we have for the current implementation of > > > pthread_once. That one uses a `static int waiters` that is shared > > > between all calls.) > > > > Yes, that's an unfortunate issue which I should fix. But making the > > once_flag a big structure rather than a single int (which will surely > > disagree with the glibc ABI) is not the solution. The solution is just > > to have a potential-waiters flag in the single int, rather than a > > waiter count. The incidence of contention is so low that it doesn't > > matter if it results in a rare excess syscall, especially since once > > calls are once-per-program-invocation. > > Sure, as I said changing the ABI for pthread_once is out of the > question. > > Yes, I agree that this is a minor "problem", if at all. > > You could even have a counter in there, the state only needs 2 bit, > the rest could be used as a wait counter. > > One idea would be to have that int as a counter, and to have special > values of the counter reflect the state. 0 would be unitialized, -1 > completely initialized, all other would be the number of waiters +1. Yes, that's probably a slightly better design. As long as our target is just a basically-linux-compatible system, we have at least 2 bits free anyway (tids are limited below 1<<30 since the upper bits are used for special purposes in the robust futex API) but of course a design that doesn't encode this knowledge is more general and cleaner. > > > > call_once does not have specified behavior if the callback is > > > > cancelled (because, as far as C11 is concerned, cancellation does not > > > > even exist). Thus there is no need for a cleanup function at all. > > > > > > After digging into it I don't agree. In a mixed pthread/C thread > > > environment cancellation will eventually happen, so we need a cleanup > > > function to avoid deadlocks. > > > > No it won't. Cancellation is unusable with almost all existing code > > (and almost all new code). To use cancellation, the target thread must > > be running code which was written to be aware of the possibility of > > being cancelled. Thus, it's not going to "just happen". > > With "will happen" I mean that code will just do it, perhaps by some > tool implemented with pthreads shooting at a C thread. Yes. But my point is that crashing in libc code is probably the least of your worries here. It's more likely that the crash (or at least deadlock) occurs in application code from cancelling a thread while it's in an inconsistent state, holds locks, etc. without a chance to do cleanup (since the target thread was not designed for cancellation). One "safe" approach might be for call_once to simply block cancellation for the duration of the call. In fact C11 threads could even start with cancellation blocked, unless of course POSIX mandates otherwise in the next issue. This should really all be a topic for discussion with the Austin Group for how POSIX should be aligned with C11 threads, though, and I'm hesitant to make any public behaviors for musl that look like API guarantees (rather than "this is the way it works for now, but don't depend on it, since it's going to change when the next issue of POSIX is published") at this time. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.