musl - Inherent race condition in linux robust

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20150410033154.GA27410@brightrain.aerifal.cx>
Date: Thu, 9 Apr 2015 23:31:54 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: libc-alpha@...rceware.org
Subject: Inherent race condition in linux robust_list system

While working on some of the code handling robust_list for robust (and
other owner-tracked) mutexes in musl, I've come across a race
condition that's inherent in the kernel's design for robust_list.
There is no way to eliminate it with the current API, and I see no way
to eliminate it without requiring a syscall to unlock robust mutexes.

The procedure for unlocking a robust_list tracked mutex looks like
this:

1. Store the address of the mutex to be unlocked in the robust_list
   "pending" slot.

2. Remove the mutex from the robust_list linked list.

3. Unlock the mutex.

4. Clear the "pending" slot in the robust_list.

The purpose of the pending slot is so that the kernel can handle the
case where a process dies asynchronously after removing the mutex from
the linked list but before it's unlocked; in this case it treats the
mutex like it's still in the list. But the kernel has no way of
knowing whether such asynchronous process death occurs before or after
step 3; it only knows it occurs between steps 2 and 4. This is very
bad.

As soon as step 3 takes place, another process can take ownership of
the mutex, and if it knows it's the last user, it can unlock and
destroy the mutex and then reuse the same memory for a new purpose
(imagine a shared-memory heap managed by a malloc-like allocator,
which would be a good application for robust mutexes). Now, if the new
use happens to store a value matching the tid of the thread whose
process is dying at the offset where the mutex owner would be stored,
the kernel misinterprets the new data stored there as a mutex
belonging to the dying process, and happily proceeds to corrupt it!

Fixing this does not look easy. The obvious way is to make clearing
the pending slot of the robust_list effectively atomic with unlocking
the mutex by doing them together in a (futex) syscall, but that would
require a syscall every time a robust mutex is unlocked. An alternate
approach would be enlarging the robust_list to have a PC range during
which the pending slot is valid. This would avoid a syscall but would
require the atomic unlock to be performed in asm (to provide labels
for the PC range). I do not see any way to fix it without kernel
changes.

Please note that this issue is distinct from glibc bug #14485, which
is easily fixable and does not affect musl. The issue I'm describing
here is much harder to fix because it's legal reuse of the same shared
memory mapping the robust mutex existed in rather than reuse of the
same virtual address range for a new mapping.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.