musl - Re: [PATCH] a new lock algorithm with lock value and CS counts in the same atomic int

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170618145339.GU1627@brightrain.aerifal.cx>
Date: Sun, 18 Jun 2017 10:53:39 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] a new lock algorithm with lock value and CS
 counts in the same atomic int

On Sun, Jun 18, 2017 at 04:40:56PM +0300, Alexander Monakov wrote:
> Some style suggestions below.
> 
> On Fri, 16 Jun 2017, Jens Gustedt wrote:
> > --- a/src/thread/__lock.c
> > +++ b/src/thread/__lock.c
> > @@ -1,15 +1,55 @@
> >  #include "pthread_impl.h"
> >  
> > +
> > +// INT_MIN for the lock, +1 for the count of this thread in the
> > +// critical section, has it that the difference between locked and
> > +// unlocked is ??INT_MAX.
> > +#if (INT_MIN+1) != -INT_MAX
> > +#error we need -INT_MAX to be INT_MIN+1
> > +#endif
> 
> I suggest dropping this and spelling 'INT_MIN + current' as 'current - 1 -
> INT_MAX' below.  (all you need is that INT_MIN <= -INT_MAX)

I think this can be dropped entirely for musl's purposes and
essentially for all practical purposes.

> > +
> >  void __lock(volatile int *l)
> >  {
> > -	if (libc.threads_minus_1)
> > -		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
> > +	/* This test is mostly useless, now. Leaving it to the first CAS is
> > +	   probably just as efficient. */
> > +	if (libc.threads_minus_1) {
> 
> I suggest 'if (!libc.threads_minus_1) return;' and unindenting the rest of the
> function.

Agree.

> > +		int current = 0;
> > +		/* A first spin lock acquisition loop, for the case of
> > +		   no or medium congestion. */
> > +		for (unsigned i = 0; i < 10; ++i) {
> > +			/* This is purely speculative. */
> 
> This comment, also duplicated below, doesn't look helpful.
> 
> > +			if (current < 0) current += INT_MAX;
> > +			// assertion: current >= 0
> > +			int val = a_cas(l, current, current - INT_MAX);
> > +			if (val == current) return;
> > +			current = val;
> 
> Might be nice to use a_spin here.

I think so. At one point (before we used volatile to model atomics) it
was probably necessary to keep the spin loop from being optimized out
entirely; now it's just likely to be lighter on cpu/bus.

> > +		}
> > +		// Spinning failed, so mark ourselves as being inside the CS.
> > +		current = a_fetch_add(l, 1) + 1;
> > +		/* The main lock acquisition loop for heavy congestion. The only
> > +		   change to the value performed inside that loop is a successful
> > +		   lock via the CAS that acquires the lock. */
> > +		for (;;) {
> > +			/* We can only go into wait, if we know that somebody holds the
> > +			   lock and will eventually wake us up, again. */
> > +			if (current < 0) {
> > +				__syscall(SYS_futex, l, FUTEX_WAIT|FUTEX_PRIVATE, current, 0) != -ENOSYS
> > +					|| __syscall(SYS_futex, l, FUTEX_WAIT, current, 0);
> 
> It's probably better to put this into a new static inline function in
> pthread_impl.h next to __wake (e.g. __futexwait) and use it here and in __wait.

Is there a reason __wait doesn't work?

> > +				/* This is purely speculative. */
> > +				current += INT_MAX;
> > +			}
> > +			/* assertion: current > 0, because the count
> > +			   includes us already. */
> > +			int val = a_cas(l, current, INT_MIN + current);
> > +			if (val == current) return;
> > +			current = val;
> > +		}
> > +	}
> >  }
> >  
> >  void __unlock(volatile int *l)
> >  {
> > -	if (l[0]) {
> > -		a_store(l, 0);
> > -		if (l[1]) __wake(l, 1, 1);
> > +	if (a_fetch_add(l, INT_MAX) != -INT_MAX) {
> > +		__syscall(SYS_futex, l, FUTEX_WAKE|FUTEX_PRIVATE, 1);
> 
> This change lost FUTEX_PRIVATE fallback handling; I don't see any reason not to
> continue using __wake here.

Agreed.

> > --- a/src/thread/pthread_detach.c
> > +++ b/src/thread/pthread_detach.c
> > @@ -6,7 +6,7 @@ int __pthread_join(pthread_t, void **);
> >  static int __pthread_detach(pthread_t t)
> >  {
> >  	/* Cannot detach a thread that's already exiting */
> > -	if (a_swap(t->exitlock, 1))
> > +	if (a_cas(t->exitlock, 0, -INT_MAX))
> >  		return __pthread_join(t, 0);
> >  	t->detached = 2;
> >  	__unlock(t->exitlock);
> 
> A good way to catch this would be to introduce a struct for the lock (out of
> scope of this patch, of course).

I see reasons for and against that. I'm not strongly against it but
just making a policy not to poke directly at these locks outside of
"lock implementation" files (perhaps adding a macro or inline function
to libc.h to do this instead?) seems like it would work just as well.
It was a poor choice to write the above direct a_cas to begin with, I
think.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.