|
Message-ID: <20170618150132.GV1627@brightrain.aerifal.cx> Date: Sun, 18 Jun 2017 11:01:32 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH] a new lock algorithm with lock value and CS counts in the same atomic int On Fri, Jun 16, 2017 at 09:11:35AM +0200, Jens Gustedt wrote: > A variant of this new lock algorithm has been presented at SAC'16, see > https://hal.inria.fr/hal-01304108. A full version of that paper is > available at https://hal.inria.fr/hal-01236734. > > The main motivation of this is to improve on the safety of the basic lock > implementation in musl. This is achieved by squeezing lock value and > "waiter" count into a single int. Thereby an unlock operation does > exactly one memory transfer (a_fetch_add) and never touches the value > again, but still detects if a waiter has to be woken up. Thanks! In the process of reviewing. > Another effect could be improved performance, but don't take this too > seriously, these lock/unlock functions are probably never used in > performance critical parts of libc. An equivalent inlined/open-coded version of the lock is used in malloc, which is the only place I see where it's performance-critical. If it really does perform significantly better it would make sense to see if we can get the benefit in malloc too; that may be harder because it would sacrifice the inlining unless we wanted to try to get the larger new code inlined (which the compiler may not want to do, or which may affect other levels of inlining in the same file). > Depending on the architecture, the performance of the previous lock > strategy was not always so great under no or medium congestion. Now, in > case of no congestion, there are exactly two memory operations, one to > lock an one to unlock. So if we would start to use this same new strategy > also for user locks such as mutexes, semaphores or the locks used for > lockfull atomics, we might see a performance improvement. It definitely can't be used for mutexes that have to record an owner (recursive, error-checking, and/or robust), or for semaphores, unless we wanted to depend on double-word atomics. There's just too much state in all these cases to fit in 32 bits. Normal mutexes could do it though. Another case that would be very nice to optimize, but where I don't see a way, is stdio FILE locks. They're recursive and thus need to record an owner. The current implementation works more like an errorchecking mutex abused as a recursive one (see also: an old SO question on that topic), having the caller be responsible for knowing whether it needs to unlock from whether it was already owned by the calling thread at lock time. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.