|
Message-ID: <1438207875.10742.3.camel@inria.fr>
Date: Thu, 30 Jul 2015 00:11:15 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: New optimized normal-type mutex?
Hello,
Am Mittwoch, den 29.07.2015, 14:09 +0200 schrieb Joakim Sindholt:
> So he went on and suggested that a cas-less lock was possible with
> a_fetch_add however I can't make it work and I don't think he can
> either. His idea however is sound: the one who flips the sign bit takes
> the lock. Based on that I've cobbled together a different lock that will
> probably perform worse than this approach but none-the-less be correct
> as far as I can tell.
>
> The difference is that we consider the lock owner a waiter as well, thus
> requiring a cas loop in the unlock function to remove itself, so to
> speak, from the waiter count. a_fetch_and also turns into a cas loop so
> I consider this fairly minor.
> This makes the wait loop a little simpler while still maintaining a
> waiter count and still only using one int.
Nice ideas!
After the recent discussion about the problems on x86_64 I was trying
to come up with a simple lock for the atomics, and I came thinking
along the same lines.
I use "unsigned" as a base type and C11 atomics notation, makes the
arguing for me easier. The idea is that the HO bit is the lock bit
("lockbit" is MAX_INT+1u). As for your ideas above, the counter part
of the total value counts the number of threads inside the critical
section (so number of threads that find themselves between the start
of lock and the end of unlock).
If we count it like that, we know exactly the contribution that a
successful locker has to the total value, namely lockbit + 1. So an
unlock operation consists in an atomic_fetch_sub of that value. From
the return of that fetch it is then easy to see if someone has to be
woken up.
The lock procedure has a simple fast path: if the total value is 0 at
the start, set the value to lockbit+1 and you are done. This can be
done with one CAS, and could also be used for a trylock.
The slow path can be a little bit more complicated. First, account for
the thread being in the critical section by adding 1. Then just as the
current musl __wait function, first spin and then FUTEX_WAIT until the
lockbit is found to be cleared.
(Also for the atomics we don't need to work with shared locks, so the
futex syscalls are straight forward.)
Jens
/* The HO bit. */
unsigned const lockbit = INT_MAX+1u;
/* The HO and LO bit. */
unsigned const contrib = INT_MAX+2u;
void __impl_mut_lock(_Atomic(unsigned)* loc)
{
if (libc.threads_minus_1) {
int spins;
int val = 0;
/* fast path */
if (atomic_compare_exchange_strong_explicit(loc, &val, contrib, memory_order_acq_rel, memory_order_consume))
return;
val = 1+atomic_fetch_add_explicit(loc, 1, memory_order_relaxed);
switch (val & lockbit) {
for (;;) {
/* The lock bit isn't yet set. */
case 0:
val = atomic_fetch_or_explicit(loc, lockbit, memory_order_acq_rel);
if (!(val & lockbit)) return;
/* The lock bit is set by someone else. */
val |= lockbit;
default:
for (spins = 100; spins-- && (val & lockbit);) {
a_spin();
val = atomic_load_explicit(loc, memory_order_consume);
}
for (;val & lockbit;) {
__syscall(SYS_futex, loc, FUTEX_WAIT, val, 0);
val = atomic_load_explicit(loc, memory_order_consume);
}
}
}
}
}
void __impl_mut_unlock(_Atomic(unsigned)* loc)
{
if (libc.threads_minus_1) {
if (contrib != atomic_fetch_sub_explicit(loc, contrib, memory_order_relaxed))
__syscall(SYS_futex, loc, FUTEX_WAKE, 1);
}
}
--
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536 ::
:: :::::::::::::::::::::: gsm France : +33 651400183 ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::
Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.