Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170618202009.GX1627@brightrain.aerifal.cx>
Date: Sun, 18 Jun 2017 16:20:09 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] a new lock algorithm with lock value and CS
 counts in the same atomic int

On Sun, Jun 18, 2017 at 09:32:09PM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Sun, 18 Jun 2017 12:04:59 -0400 Rich Felker <dalias@...c.org> wrote:
> 
> > > > Is there a reason __wait doesn't work?  
> > > 
> > > __wait doesn't fit here at all, for instance it manipulates a
> > > separate int with waiters count.  
> > 
> > It accepts a null pointer for the waiters count, so that it can be
> > used in contexts with or without one. Any cost of the additional
> > conditional branches is dominated by syscall time.
> 
> Looking into it, I don't agree with you, Rich. Even with waiters set
> to 0 it would to 100 spins before going into the syscall. This is much
> of a waste, here, because we are just comming out of a spin (at the
> first iteration) or we did spin around as long as the value was
> positive.

I haven't reviewed that logic yet, so perhaps that's what I'm missing.
I'll follow up with more after I understand that code better.

> I don't see why the cost of 100 spins would be dominated by the
> syscall. If I remember correctly, the benchmarks that I made showed
> about 10 memory operations for an unsuccessful syscall. This is why
> the magic number for the initial spin is set to 10.

A syscall takes at least 500 cycles on an extremely fast system, and
more like 1500-2000 on many. 100 spins was determined empirically and
is probably near the break-even point. If it's a bad choice, it's
probably also bad for other users of __wait.

> It might be benefitial to do a_spin for a while, if we know that CS
> that are protected by this lock are really short, just some
> cycles. But 100 is a far too big number, and in my benchmarks I found
> not much indication of a benefit for it.
> 
> If we want code sharing with the rest of musl (which we should) I like
> Alexander's idea of a __futexwait inline function much better.

I don't think there's any value to making it inline. If it could be a
single syscall, that would be one thing, but with the fallback for old
systems that lack private futex, it's just a gratuitously large inline
chunk that's likely to interfere with inlining/optimization of the
caller, and certainly has no potential to improve performance (since
there's a syscall involved).

The original intent was that __wait be the "__futexwait" primitive
Alexander had in mind. If it's too heavy as it is now, perhaps that's
a mistake that affects other usage too. I was never very happy with
the shaky theoretical foundations of the spinning, but I did feel like
I at least got it to the point where it didn't have pathologically bad
cases.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.