Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200522162142.GV1079@brightrain.aerifal.cx>
Date: Fri, 22 May 2020 12:21:42 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Serious missing synchronization bug between internal locks and
 threads_minus_1 1->0 transition

This was reported on #musl yesterday as a problem in malloc
(oldmalloc). When the second-to-last thread exits (leaving the process
single-threaded again), it decrements libc.threads_minus_1. The
remaining thread can then observe this as a relaxed-order atomic,
which is fine in itself, but then uses the observed
single-threadedness to skip locks. This means it also skips
synchronization with any changes made to memory by the exiting thread.
The race goes like (L exiting, M remaining):

M accesses and caches data D
                                    L takes lock protecting D
                                    L modifies D
                                    L releases lock protecting D
                                    L call pthread_exit
                                    L locks thread list
                                    L decrements threads_minus_1
                                    L exits & releases list
M observes threads_minus_1==0
M skips taking lock protecting D
M accesses D, using cached value

Note that while one might expect this not to happen on x86 with strong
memory ordering, the lack of memory barrier also ends up being a lack
of *compiler barrier*, and in the observed breakage, it was actually
the compiler, not the cpu, caching D between the first and last line
of the above example.

The simple, safe fix for this is to stop using libc.threads_minus_1 to
skip locking and instead use libc.threaded, which is
permanent-once-set. This is the first change I plan to commit, to have
a working baseline, but it will be something of a performance
regression for mostly-single-threaded programs which only occasionally
use threads since they'll be stuck using locks all the time.

I've been trying to work out a better fix, based on SYS_membarrier
from the exiting thread, but that's complicated by it only being able
to impose memory barriers, not compiler barriers.

Anyway, first fix coming soon. This will be important for distros to
pick up.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.