Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63c0897d647936c946268f5a967a5e4d@ispras.ru>
Date: Sat, 11 Feb 2023 17:45:14 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] mq_notify: fix close/recv race on failure path

On 2023-02-10 19:29, Rich Felker wrote:
> On Wed, Dec 14, 2022 at 09:49:26AM +0300, Alexey Izbyshev wrote:
>> On 2022-12-14 05:26, Rich Felker wrote:
>> >On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote:
>> >>In case of failure mq_notify closes the socket immediately after
>> >>sending a cancellation request to the worker thread that is going to
>> >>call or have already called recv on that socket. Even if we don't
>> >>consider the kernel behavior when the only descriptor to an
>> >>object that
>> >>is being used in a system call is closed, if the socket descriptor is
>> >>closed before the kernel looks at it, another thread could open a
>> >>descriptor with the same value in the meantime, resulting in recv
>> >>acting on a wrong object.
>> >>
>> >>Fix the race by moving pthread_cancel call before the barrier wait to
>> >>guarantee that the cancellation flag is set before the worker thread
>> >>enters recv.
>> >>---
>> >>Other ways to fix this:
>> >>
>> >>* Remove the racing close call from mq_notify and surround recv
>> >>  with pthread_cleanup_push/pop.
>> >>
>> >>* Make the worker thread joinable initially, join it before closing
>> >>  the socket on the failure path, and detach it on the happy path.
>> >>  This would also require disabling cancellation around join/detach
>> >>  to ensure that mq_notify itself is not cancelled in an inappropriate
>> >>  state.
>> >
>> >I'd put this aside for a while because of the pthread barrier
>> >involvement I kinda didn't want to deal with. The fix you have sounds
>> >like it works, but I think I'd rather pursue one of the other
>> >approaches, probably the joinable thread one.
>> >
>> >At present, the implementation of barriers seems to be buggy (I need
>> >to dig back up the post about that), and they're also a really
>> >expensive synchronization tool that goes both directions where we
>> >really only need one direction (notifying the caller we're done
>> >consuming the args). I'd rather switch to a semaphore, which is the
>> >lightest and most idiomatic (at least per present-day musl idioms) way
>> >to do this.
>> >
>> This sounds good to me. The same approach can also be used in
>> timer_create (assuming it's acceptable to add dependency on
>> pthread_cancel to that code).
>> 
>> >Using a joinable thread also lets us ensure we don't leave around
>> >threads that are waiting to be scheduled just to exit on failure
>> >return. Depending on scheduling attributes, this probably could be
>> >bad.
>> >
>> I also prefer this approach, though mostly for aesthetic reasons (I
>> haven't thought about the scheduling behavior). I didn't use it only
>> because I felt it's a "logically larger" change than simply moving
>> the pthread_barrier_wait call. And I wasn't aware that barriers are
>> buggy in musl.
> 
> Finally following up on this. How do the attached commits look?
> 
The first and third patches add calls to sem_wait, pthread_join, and 
pthread_detach, which are cancellation points in musl, so cancellation 
needs to be disabled across those calls. I mentioned that in my initial 
mail.

Also, I wasn't sure if it's fine to just remove 
pthread_attr_setdetachstate call, and I found the following in POSIX[1]:

"The function shall be executed in an environment as if it were the 
start_routine for a newly created thread with thread attributes 
specified by sigev_notify_attributes. If sigev_notify_attributes is 
NULL, the behavior shall be as if the thread were created with the 
detachstate attribute set to PTHREAD_CREATE_DETACHED. Supplying an 
attributes structure with a detachstate attribute of 
PTHREAD_CREATE_JOINABLE results in undefined behavior."

This language seems to forbid calling sigev_notify_function in the 
context of a joinable thread. And even if musl wants to ignore this, 
PTHREAD_CREATE_JOINABLE must still be set manually if 
sigev_notify_attributes is not NULL.

Otherwise, the patches look good to me.

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02

Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.