|
Message-ID: <CAF=dzRMjtCkZn0Vx31Bc14-+cnUDi_fP41=tf7eMShAchE-tNw@mail.gmail.com> Date: Sat, 12 Mar 2022 10:10:47 -0700 From: Gavin Howard <gavin.d.howard@...il.com> To: Rich Felker <dalias@...c.org> Cc: musl@...ts.openwall.com Subject: Re: Possible PEBKAC or Bug in musl Semaphores and/or Mutexes > This is why condition variables necessarily have an associated > predicate (in your case, involving your flag and possibly other > state). You can *never* just do pthread_cond_wait. The only correct > usage is: > > while (!predicate(...)) pthread_cond_wait(mtx,cnd); > > Correct use of condition variables ensures that you see any relevant > changes to the state the predicate is evaluated on, without requiring > you to explicitly work out a way to do the same thing with semaphores > or other lower-level primitives. It does not matter whatsoever whether > there are already waiters when you call signal. If not, they'll > necessarily see the state change *before* they wait. The state change > cannot happen between evaluation of the predicate and the wait because > the calling thread holds the mutex. Ah, I misunderstood. > > What this means is that if Main Thread #2 is blocked on waitpid(), then > > if another thread creates a child and signals the condition variable, > > then after Main Thread #2 returns from waitpid(), it will block on the > ^^^^^^^^^^^^^^^^^^^^ > > condition variable. If another thread creates another child, sure, it > ^^^^^^^^^^^^^^^^^^ > > No it won't, because it evaluates the predicate that tells it there's > something to do before calling wait. If you're not doing that, you're > using cond vars in a fundamentally wrong way. I have now implemented the system using the mutex and changing the semaphore to a condition variable with the flag and two other variables for how many children there are versus how many children have been reaped. We'll see if anything shows up while I run it over and over again. > No, this is absolutely not what's happening. Neither the processor nor > the compiler (in general the latter is at least as much of a concern > if not more when you have missing/incorrect synchronization) can > reorder things like that. Good to know, thank you. > On some level they're like any other external function call to a > function whose definition the layer optimizing the caller can't see: > they must be assumed to be able to store to or load from any memory > whose address has been exposed, and thus cannot be reordered > whatsoever. Okay, I wondered if that might be the case. > Your problem is not mysterious reordering. Your problem is in your > program's logic somewhere. Please use the debugging tools at your > disposal, especially strace which will probably reveal the problem to > you right away (by letting you see the exact sequence of events that > happened and trace through it to figure out where it mismatches your > expectation). I did use strace. It revealed nothing out of the ordinary. That's why I did not mention it, but I probably should have. I've also used GDB to inspect the core dumps. I did try. Perhaps while I'm learning and making a fool of myself, I'll mention my problem with rwlocks. The relevant code is: ``` do { bool rel = (strchr((char*) cmd->a, '/') != NULL); cont = false; // We only need to do something when the cmd is not a relative path. if (!rel) { s = y_strucon_rwlock_rdlock(&r->env.lock); if (y_err(s != y_STATUS_SUCCESS)) goto err; exists = y_map_existsStrings_v(&r->env.exec_map, (char*) cmd->a, &res); // We have to hold the lock until we have copied the result because it // could be moved by a write to the map. if (exists) { // Just move the value from res to cmd. I can do this because // the string in res is heap allocated and is not affected by // edits to the map. cmd->len = res->len; cmd->a = res->a; // Release the lock as soon as possible. y_strucon_rwlock_rdunlock(&r->env.lock); } else { // Release the lock as soon as possible. y_strucon_rwlock_rdunlock(&r->env.lock); <Find executable, error if non-existent, and prepare entry> s = y_strucon_rwlock_wrlock(&r->env.lock); if (y_err(s != y_STATUS_SUCCESS)) { y_str_free(&str); goto err; } // Make sure someone didn't get there first. if (!y_map_existsStrings(&r->env.exec_map, (char*) cmd->a)) { s = y_map_insertStrings(&r->env.exec_map, (char*) cmd->a, (char*) str.a); if (s == y_STATUS_ELEM_EXISTS) { y_panica("Element already exists"); } } y_strucon_rwlock_wrunlock(&r->env.lock); // Always free first. y_str_free(&str); y_stackpool_free(pool); if (y_err(s != y_STATUS_SUCCESS)) goto err; cont = true; } } } while (cont); err: <Do some error handling> ``` Besides initialization (before any other threads are created) and destruction (after all other threads are joined), these are the only references to the map in question. `y_strucon_rwlock_*` functions are just wrappers around POSIX rwlocks. In this case, I'm not doing anything fancy; it's just rwlocks. However, that abort about the element already existing will be triggered consistently within about 5 minutes, both on musl and glibc, so probably PEBKAC. If I change the read lock near the beginning with a write lock, I still get the same issue. However, if I change the rwlock for a mutex, and update all locking and unlocking to match, I don't get the issue. In this case, the strace shows nothing out of the ordinary that I can see. Yet I can't see how I am doing anything wrong. I've double checked that y_map_existsStrings{,_v} do not edit the map at all. So where's my stupidity on this one? Gavin Howard
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.