|
Message-ID: <20170618202009.GX1627@brightrain.aerifal.cx> Date: Sun, 18 Jun 2017 16:20:09 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH] a new lock algorithm with lock value and CS counts in the same atomic int On Sun, Jun 18, 2017 at 09:32:09PM +0200, Jens Gustedt wrote: > Hello Rich, > > On Sun, 18 Jun 2017 12:04:59 -0400 Rich Felker <dalias@...c.org> wrote: > > > > > Is there a reason __wait doesn't work? > > > > > > __wait doesn't fit here at all, for instance it manipulates a > > > separate int with waiters count. > > > > It accepts a null pointer for the waiters count, so that it can be > > used in contexts with or without one. Any cost of the additional > > conditional branches is dominated by syscall time. > > Looking into it, I don't agree with you, Rich. Even with waiters set > to 0 it would to 100 spins before going into the syscall. This is much > of a waste, here, because we are just comming out of a spin (at the > first iteration) or we did spin around as long as the value was > positive. I haven't reviewed that logic yet, so perhaps that's what I'm missing. I'll follow up with more after I understand that code better. > I don't see why the cost of 100 spins would be dominated by the > syscall. If I remember correctly, the benchmarks that I made showed > about 10 memory operations for an unsuccessful syscall. This is why > the magic number for the initial spin is set to 10. A syscall takes at least 500 cycles on an extremely fast system, and more like 1500-2000 on many. 100 spins was determined empirically and is probably near the break-even point. If it's a bad choice, it's probably also bad for other users of __wait. > It might be benefitial to do a_spin for a while, if we know that CS > that are protected by this lock are really short, just some > cycles. But 100 is a far too big number, and in my benchmarks I found > not much indication of a benefit for it. > > If we want code sharing with the rest of musl (which we should) I like > Alexander's idea of a __futexwait inline function much better. I don't think there's any value to making it inline. If it could be a single syscall, that would be one thing, but with the fallback for old systems that lack private futex, it's just a gratuitously large inline chunk that's likely to interfere with inlining/optimization of the caller, and certainly has no potential to improve performance (since there's a syscall involved). The original intent was that __wait be the "__futexwait" primitive Alexander had in mind. If it's too heavy as it is now, perhaps that's a mistake that affects other usage too. I was never very happy with the shaky theoretical foundations of the spinning, but I did feel like I at least got it to the point where it didn't have pathologically bad cases. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.