Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4408deeb62fe668bf720d3c6c8bedda2@ispras.ru>
Date: Sat, 11 Feb 2023 23:14:33 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] mq_notify: fix close/recv race on failure path

On 2023-02-11 22:49, Rich Felker wrote:
> On Sat, Feb 11, 2023 at 10:28:20PM +0300, Alexey Izbyshev wrote:
>> On 2023-02-11 21:35, Rich Felker wrote:
>> >On Sat, Feb 11, 2023 at 09:08:53PM +0300, Alexey Izbyshev wrote:
>> >>On 2023-02-11 20:59, Rich Felker wrote:
>> >>>On Sat, Feb 11, 2023 at 08:50:15PM +0300, Alexey Izbyshev wrote:
>> >>>>On 2023-02-11 20:13, Markus Wichmann wrote:
>> >>>>>On Sat, Feb 11, 2023 at 10:06:03AM -0500, Rich Felker wrote:
>> >>>>>>--- a/src/thread/pthread_detach.c
>> >>>>>>+++ b/src/thread/pthread_detach.c
>> >>>>>>@@ -5,8 +5,12 @@ static int __pthread_detach(pthread_t t)
>> >>>>>> {
>> >>>>>> 	/* If the cas fails, detach state is either already-detached
>> >>>>>> 	 * or exiting/exited, and pthread_join will trap or cleanup. */
>> >>>>>>-	if (a_cas(&t->detach_state, DT_JOINABLE, DT_DETACHED) !=
>> >>>>>>DT_JOINABLE)
>> >>>>>>+	if (a_cas(&t->detach_state, DT_JOINABLE, DT_DETACHED) !=
>> >>>>>>DT_JOINABLE) {
>> >>>>>>+		int cs;
>> >>>>>>+		__pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cs);
>> >>>>>> 		return __pthread_join(t, 0);
>> >>>>>                ^^^^^^ I think you forgot to rework this.
>> >>>>>>+		__pthread_setcancelstate(cs, 0);
>> >>>>>>+	}
>> >>>>>> 	return 0;
>> >>>>>> }
>> >>>>>>
>> >>>>>
>> >>>>>I see no other obvious missteps, though.
>> >>>>>
>> >>>>Same here, apart from this and misspelled "pthred_detach" in the
>> >>>>commit message, the patches look good to me.
>> >>>>
>> >>>>Regarding the POSIX requirement to run sigev_notify_function in the
>> >>>>context of a detached thread, while it's possible to observe the
>> >>>>wrong detachstate for a short while via pthread_getattr_np after
>> >>>>these patches, I'm not sure there is a standard way to do that. Even
>> >>>>if it exists, this minor issue may be not worth caring about.
>> >>>
>> >>>Would this just be if the notification callback executes before
>> >>>mq_notify returns in the parent?
>> >>
>> >>Yes, it seems so.
>> >>
>> >>>I suppose we could have the newly
>> >>>created thread do the work of making the syscall, handling the error
>> >>>case, detaching itself on success and and reporting back to the
>> >>>mq_notify function whether it succeeded or failed via the
>> >>>semaphore/args structure. Thoughts on that?
>> >>>
>> >>Could we just move pthread_detach call to the worker thread to the
>> >>point after pthread_cleanup_pop?
>> >
>> >I thought that sounded dubious, in that it might lead to an attempt to
>> >join a detached thread, but maybe it's safe to assume recv will never
>> >return if the mq_notify syscall failed...?
>> >
>> Actually, because app signals are not blocked when the worker thread
>> is created, recv can indeed return early with EINTR. But this looks
>> like just a bug.
> 
> Yes. While it's not a conformance bug to run with signals unblocked
> ("The signal mask of this thread is implementation-defined.") it's a
> functional bug to ever introduce threads that don't block all
> application signals, since these interfere with sigwait & other
> application control of where signals are delivered. This is an
> oversight. I'll make it mask all signals.
> 
>> Otherwise, mq_notify already assumes that recv can't return before
>> SYS_mq_notify (if it did, the syscall would try to register a closed
>> fd). I haven't tried to prove it (e.g. maybe recv may need to
>> allocate something before blocking and hence can fail with ENOMEM?),
>> but if it's true, I don't see how a failed SYS_mq_notify could cause
>> recv to return, so joining a detached thread should be impossible if
>> we make pthread_detach follow recv.
> 
> I'm thinking for now maybe we should just drop the joining on error,
> and leave it starting out detached. While recv should not fail, it's
> obviously possible to make it fail in a seccomp sandbox, and you don't
> want that to turn into UB inside the implementation. If it does fail,
> the thread should still exit, but we have no way to synchronize with
> the mq_notify parent to decide whether it's being joined or not in
> this case without extra sync machinery...
> 
By dropping pthread_join we'd avoid introducing a new UB case if recv 
fails unexpectedly, but the existing case that I mentioned 
(SYS_mq_notify trying to register a closed fd) would remain. It seems to 
me that moving SYS_mq_notify into the worker thread as you suggested 
earlier is the cleanest option if we're worrying about recv.

Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.