|
Message-ID: <d352f85c0aab78a5777b0857a12e1de3@ispras.ru> Date: Mon, 21 Nov 2022 08:14:50 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Subject: sem_post() can miss waiters Hi, While reading musl's POSIX semaphore implementation I realized that its accounting of waiters is incorrect (after commit 88c4e720317845a8e01aee03f142ba82674cd23d) due to combination of the following factors: * sem_post relies on a potentially stale waiters count when deciding whether to wake a waiter. * Presence of waiters is not reflected in the semaphore value when the semaphore is unlocked. Here is an example scenario with 4 threads: * initial state - val = 1, waiters = 0 * T1 - enters sem_post - reads val = 1, waiters = 0, is preempted - val = 1, waiters = 0 * T2 - sem_wait - another sem_wait (blocks) - val = -1, waiters = 1 * T3 - sem_wait (blocks) - val = -1, waiters = 2 * T4 - sem_post (wakes T2) - val = 1, waiters = 2 * T1 - continues in sem_post - a_cas(1, 2) succeeds - T3 is NOT woken because sem_post thinks there are no waiters - val = 2, waiters = 1 * T2 - wakes and completes sem_wait - val = 1, waiters = 1 So in the end T3 remains blocked despite that the semaphore is unlocked. Here is two ideas how this can be fixed without reintroducing the problem with accessing a potentially destroyed semaphore after it's unlocked. * Idea 1 - use a proper WAITERS_BIT in the semaphore state instead of just a single value and preserve it in sem_post by default - in sem_post, when we notice that waiters count is zero but WAITERS_BIT is set, reset WAITERS_BIT, but wake all potential waiters afterwards. This ensures that any new waiters that could arrive after we read waiters count are taken into account (and they will proceed to lock the semaphore and potentially set WAITERS_BIT again after they are woken). The sem_post body would look like this (the full patch is attached): int val, new, priv = sem->__val[2], sync = 0; do { val = sem->__val[0]; waiters = sem->__val[1]; if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } new = val + 1; if (!waiters) new &= ~0x80000000; } while (a_cas(sem->__val, val, new) != val); if (val<0) __wake(sem->__val, waiters ? 1 : -1, priv); The main downside of this approach is that a FUTEX_WAKE system call will be performed each time the semaphore transitions from "maybe have waiters" to "no waiters" state, and in practice it will be useless in most cases. * Idea 2 - use a proper WAITERS_BIT as above - also add a new DETECT_NEW_WAITERS_BIT - in sem_post, when we notice that waiters count is zero, WAITERS_BIT is set and DETECT_NEW_WAITERS_BIT is not set, atomically transition into "waiters detection" state (DETECT_NEW_WAITERS_BIT is set, WAITERS_BIT is unset) *without unlocking the semaphore*. Then recheck waiters count again, and if it's still zero, try to atomically transition to "no waiters" state (both bits are unset) while also unlocking the semaphore. If this succeeds, we know that no new waiters arrived before we unlocked the semaphore and hence no wake up is needed. Regardless of whether we succeeded, we always leave "waiters detection" state before exiting the CAS loop. - in sem_post, wake a single waiter if at least one of WAITERS_BIT or DETECT_NEW_WAITERS_BIT is set. This ensures no waiters are missed in sem_post calls racing with the sem_post currently responsible for waiters detection. The sem_post body would look like this (the full patch is attached): int val, new, priv = sem->__val[2], sync = 0; for (;;) { int cnt; val = sem->__val[0]; cnt = val & SEM_VALUE_MAX; if (cnt < SEM_VALUE_MAX) { if (!sync && val < 0 && !(val & 0x40000000) && !sem->__val[1]) { new = cnt | 0x40000000; if (a_cas(sem->__val, val, new) != val) continue; val = new; sync = 1; } new = val + 1; } else { new = val; } if (sync) { if (sem->__val[1]) new |= 0x80000000; new &= ~0x40000000; } if (a_cas(sem->__val, val, new) == val) break; } if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) { errno = EOVERFLOW; return -1; } if (new < 0 || (new & 0x40000000)) __wake(sem->__val, 1, priv); Compared to the first approach, there is just an extra a_cas instead of a system call, though we lose one bit in SEM_VALUE_MAX. Hope this helps, Alexey View attachment "sem-post-detect-new-waiters.diff" of type "text/x-diff" (3680 bytes) View attachment "sem-post-wake-all.diff" of type "text/x-diff" (2795 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.