|
Message-ID: <20150520145522.GW17573@brightrain.aerifal.cx> Date: Wed, 20 May 2015 10:55:22 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: trouble spots for atomic access On Wed, May 20, 2015 at 09:02:19AM +0200, Jens Gustedt wrote: > Am Dienstag, den 19.05.2015, 19:15 -0400 schrieb Rich Felker: > > On Wed, May 20, 2015 at 12:47:44AM +0200, Jens Gustedt wrote: > > > > > - pthread_once_t should always be volatile > > > > > - pthread_spinlock_t should always be volatile > > > > > > > > These are C++ ABI changes. I'd like to make the change but I'm > > > > concerned it might break things. > > > > > > Both are broken as they are now, if you fall into a compiler that > > > "knows" that the object itself isn't volatile qualified, and by that > > > excuse takes the liberty to optimize out loads. For both types there > > > is one point where there is a cast to (volatile int*) followed by a > > > load, that might not do what we want. > > > > > > (For pthread_once_t, this on line 43 in pthread_once.c) > > > > > > I think the safest would be to make the data types volatile. If that > > > is not possible, do the load with some new function "a_load_rel" that > > > is guaranteed to be volatile, atomic and with memory_order relaxed. > > > > This would require lots of changes and it would be easy to overlook > > some. > > No, no, I just meant a special function that is applied for these two > special cases. (With a big warning flag that we only have that because > ABI compatibility issues.) What I'm saying is that there are additional loads for these two special cases that take place inside the CAS loops for implementing a_* in atomic.h; it's not just the loads in the musl source files that are affected. So to make the problem go away, we'd need to use a_load_rel inside those CAS loops in the implementation(s) of atomic.h too, which would be a fairly invasive change. > > The main reason is that lots of the implementations of a_* > > functions perform loads via C code inside their CAS loops and need the > > load to be volatile. They'd all need to be changed to use a_load_rel, > > and a_load_rel would in turn need to be added for all archs. > > > > I have a slightly different approach: destroy the compiler's ability > > to known the identity of the object being accessed by passing it > > through: > > > > volatile int *make_volatile(volatile int *p) > > { > > __asm__ ( "" : "=r"(p) : "0"(p) ); > > return p; > > } > > Really not so different, I was just thinking to have one line of > assembler that does the load directly. So minor details. The difference is that you can factor this out by 'converting' the address once on function entry so that none of the later code working with the object is affected. > > > > > - pthread_barrier needs atomic increment > > > > > > > > Where? > > > > > > I think all uses of count in pthread_barrier_wait should be > > > atomic. The increments in lines 15 and 93 should be atomic_fetch_add. > > > > > > Probably most archs do a ++ on a volatile as one memory operation, but > > > nothing enforces that. > > > > At line 93, the modification to inst->count happens with a lock held. > > There is no concurrent access. > > > > At line 15, I think there may be an issue; I need to look closer. But > > it's not concurrent writes that need atomicity of the ++ as a RMW > > operation; those are excluded by a lock. It's just the possibility of > > another thread reading concurrently with the writes that I'm concerned > > may happen. > > I don't think that both of these are very performance critical, there > are a lot of locks, descheduling etc in that function. So I would > clearly go for security and simplicity here: make them all atomic. Process-shared barriers have pretty bad performance anyway due to trying to meet near-impossible constraints, but non-pshared ones are intended to be fast. Since the access is clearly synchronized by the lock in the non-pshared case I'd rather leave it alone. For pshared I want to understand the issue (which I think is a real issue) before deciding on an action, but your proposed action might be the cleanest fix. > > > > I don't know a really good solution to this. The vast majority of uses > > > > of tid are self->tid, from which perspective tid is not volatile or > > > > even mutable. We don't want to pessimize them with volatile access. > > > > Probably the best approach is to split it out into separate tid and > > > > exit_futex fields. > > > > > > Yes, this would probably be safer and cleaner. > > > > I'm not sure if it's easy to do, though. There seem to be race > > conditions, since ->tid is accessed from other threads for things like > > pthread_kill and pthread_cancel. If we setup ->tid from start(), it > > might not be seen by a pthread_kill or pthread_cancel made by the > > caller of pthread_create after pthread_create returns. If we setup > > ->tid from pthread_create, it would not be seen by the new thread > > immediately when using self->tid. If we setup ->tid from both places, > > then we have a data race (concurrent writes) unless we make it atomic > > (thus volatile) and then we're back to the original problem. (Note > > that we don't have this race now because ->tid is set in the kernel > > before clone returns in either the parent or the child.) > > > > I think the above make_volatile trick would solve the problem without > > adding a new field. > > probably > > > Alternatively, instead of ->tid and ->exit_futex, we could have ->tid > > (used by self) and ->volatile_tid (used by other threads acting on the > > target). > > I would prefer something along that line, looks cleaner to me. Yes, I mildly lean that way too, but I would probably prefer the make_volatile solution if we need it for the above issues (i.e. if it's not practical to change the types). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.