Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190527134305.GI23599@brightrain.aerifal.cx>
Date: Mon, 27 May 2019 09:43:05 -0400
From: Rich Felker <dalias@...c.org>
To: pengyuanhong <pengyuanhong@...wei.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>,
	leijitang <leijitang@...wei.com>,
	"Huangqiang (H)" <h.huangqiang@...wei.com>
Subject: Re: thread which is blocked by mutex cannot be canceled

On Mon, May 27, 2019 at 01:17:41PM +0000, pengyuanhong wrote:
> Hello,
> 
> I am doing test for pthread_cancel(), and the following test can
> pass when compiled with gcc while fail compiled with musl-gcc:
> 
> ```
> $ cat test_pthread_cancel.c
> #include <pthread.h>
> #include <stdio.h>
> #include <errno.h>
> #include <unistd.h>
> # define INTHREAD 0     /* Control going to or is already for Thread */
> # define INMAIN 1       /* Control going to or is already for Main */
> # define TIMEOUT 10     /* Time out time in seconds */
> 
> int sem1;               /* Manual semaphore */
> int cleanup_flag;       /* Flag to indicate the thread's cleanup handler was called */
> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;      /* Mutex */
> 
> 
> /* Cleanup function that the thread executes when it is canceled.  So if
> * cleanup_flag is 1, it means that the thread was canceled. */
> static void a_cleanup_func(void *args)
> {
>         cleanup_flag = 1;
>         return;
> }
> 
> /* Function that the thread executes upon its creation */
> static void *a_thread_func(void)
> {
>         pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
>         pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
> 
>         pthread_cleanup_push(a_cleanup_func, NULL);
> 
>         /* Indicate to main() that the thread has been created. */
>         sem1 = INMAIN;
> 
>         /* Lock the mutex. It should have already been locked in main, so the thread
>          * should block. */
>         if (pthread_mutex_lock(&mutex) != 0) {
>                 printf("Error in pthread_mutex_lock()\n");
>                 pthread_exit((void *) -1);
>                 return (void *) -1;
>         }
> 
>         /* Shouldn't get here if the cancel request was honored immediately
>          * like it should have been. */
>         cleanup_flag = -1;
>         pthread_cleanup_pop(0);
>         pthread_exit(0);
>         return NULL;
> }
> 
> int main(void)
> {
>         pthread_t new_th;
>         int i = 0;
> 
>         /* Initializing values */
>         sem1 = INTHREAD;
>         cleanup_flag = 0;
> 
>         /* Lock the mutex */
>         if (pthread_mutex_lock(&mutex) != 0) {
>                 printf("Error in pthread_mutex_lock()\n");
>                 return -1;
>         }
> 
>         /* Create a new thread. */
>         if (pthread_create(&new_th, NULL, (void *)a_thread_func, NULL) != 0) {
>                 printf("Error creating thread\n");
>                 return -1;
>         }
> 
>         /* Make sure thread is created before we cancel it. (wait for
>          * a_thread_func() to set sem1=INMAIN.) */
>         while (sem1 == INTHREAD)
>                 sleep(1);
> 
>         /* Send cancel request to the thread.  */
>         if (pthread_cancel(new_th) != 0) {
>                 printf("Test FAILED: Error in pthread_cancel()\n");
>                 return -1;
>         }
>         /* Wait for the thread to either cancel immediately (as it should do) and call it's
>          * cleanup handler, or for TIMEOUT(10) seconds if the cancel request was not honored
>          * immediately. */
>         while ((cleanup_flag == 0) && (i != TIMEOUT)) {
>                 sleep(1);
>                 i++;
>         }
> 
>         /* Unlock the mutex */
>         pthread_mutex_unlock(&mutex);
>         pthread_join(new_th, NULL);
> 
>         /* This means that the cleanup function wasn't called, so the cancel
>          * request was not honord immediately like it should have been. */
>         if (cleanup_flag <= 0) {
>                 printf("Test FAILED: Cancel request timed out\n");
>                 return -1;
>         }
> 
>         printf("Test PASSED\n");
>         return 0;
> }
> $
> $ gcc -pthread test_pthread_cancel.c
> $ ./a.out
> Test PASSED
> $
> $ /usr/local/musl/bin/musl-gcc test_pthread_cancel.c
> $ ./a.out
> ====new->tid is 70450, self->tid is 70449, pid is 70449=====
> Test FAILED: Cancel request timed out
> ```
> 
> Digging into musl's codes, I find that in __timedwait(), PTHREAD_CANCEL_DISABLE
> is set before thread is waiting for the futex.
> 
> In the above test, when child thread is blocked by futex, it is canceled
> by main thread, then SIGCANCEL is signaled, and child thread is interrupted by
> this signal. At this point, canceldisable is true in child, so the child just return from
> cancel_handler() and is blocked again by mutex.
> 
> So I wonder,
> 
> 1.       Why PTHREAD_CANCEL_DISABLE is set before thread is waiting for the futex?
> 
> 2.       Is pthread_mutex_lock() a possible cancellation point?

No, pthread_mutex_lock is not a cancellation point. The list of
cancellation points is here:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_02

The test above, as I understand it, is trying to assert that
*asynchronous* cancellation works on a thread blocked at
pthread_mutex_lock. Since pthread_mutex_lock is not one of the three
functions defined as async-cancel-safe:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_04

this has undefined behavior. Moreover, there is no possible way to
define it which would actually be usable, because with async
cancellation, your program could not distinguish between the state of
being cancelled during or before pthread_mutex_lock (while not holding
the mutex) from being called after pthread_mutex_lock succeeded but
before it returned/reached some subsequent point in the caller.

For the most part, async cancellation is completely unusable. As noted
above, there are only three legal standard functions you can call
while it's enabled. Formally you can't even call things like strlen()
or cos(). It only makes sense when you want to enable cancellation of
a long-running *pure computation* that you write without use of any
std functions.

If you need a cancellable lock wait, leave async cancellation off and
use a semaphore or a construct involving a condvar (pthread_cond_wait
is cancellable and has semantics compatible with cancellation, without
the above-mentioned race condition).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.