|
Message-ID: <70a52b448eef097b1dc1dd6f78f1921e@ispras.ru> Date: Fri, 07 Oct 2022 11:18:36 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Subject: Re: Re: MT fork and key_lock in pthread_key_create.c On 2022-10-06 22:50, Rich Felker wrote: > On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote: >> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote: >> > On 2022-10-06 09:37, Alexey Izbyshev wrote: >> > >Hi, >> > > >> > >I noticed that fork() doesn't take key_lock that is used to protect >> > >the global table of thread-specific keys. I couldn't find mentions of >> > >this lock in the MT fork discussion in the mailing list archive. Was >> > >this lock overlooked? >> > > >> > >Also, I looked at how __aio_atfork() handles a similar case with >> > >maplock, and it seems wrong. It takes the read lock and then simply >> > >unlocks it both in the parent and in the child. But if there were >> > >other holders of the read lock at the time of fork(), the lock won't >> > >end up in the unlocked state in the child. It should probably be >> > >completely nulled-out in the child instead. >> > > >> > Looking at aio further, I don't understand how it's supposed to work >> > with MT fork at all. __aio_atfork() is called in _Fork() when the >> > allocator locks are already held. Meanwhile another thread could be >> > stuck in __aio_get_queue() holding maplock in exclusive mode while >> > trying to allocate, resulting in deadlock. >> >> Indeed, this is messy and I don't think it makes sense to be doing >> this at all. The child is just going to throw away the state so the >> parent shouldn't need to synchronize at all, but if we walk the >> multi-level map[] table in the child after async fork, it's possible >> that the contents seen are inconsistent, even that the pointers are >> only half-written or something. >> Doesn't musl assume that pointer-sized memory accesses are atomic? >> I see a few possible solutions: >> >> 1. Just set map = 0 in the child and leak the memory. This is not >> going to matter unless you're doing multiple generations of fork >> with aio anyway. >> >> 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in >> the child, and if it succeeds, we know the map is consistent so we >> can just zero it out the same as now. Still "leaks" but only on >> contention to expand the map. >> >> 3. Getting a little smarter still: move the __aio_atfork for the >> parent side from _Fork to fork, outside of the critical section >> where malloc lock is held. Then proceed as in (2). Now, the >> tryrdlock is guaranteed to succeed in the child. Leak is only >> possible when _Fork is used (in which case the child context is an >> async signal one, and thus calling any aio_* that would allocate >> map[] again is UB -- note that in this case, the only reason we >> have to do anything at all in the child is to prevent close from >> interacting with aio). >> >> After writing them out, 3 seems like the right choice. > I agree. > Proposed patch attached. > diff --git a/src/aio/aio.c b/src/aio/aio.c > index fa24f6b6..4c3379e1 100644 > --- a/src/aio/aio.c > +++ b/src/aio/aio.c > @@ -401,11 +401,25 @@ void __aio_atfork(int who) > if (who<0) { > pthread_rwlock_rdlock(&maplock); > return; > + } else if (!who) { > + pthread_rwlock_unlock(&maplock); > + return; > } It probably makes sense to reset "aio_fd_cnt" here, though it matters only in a case when there are so many nested fork() children each using aio that it eventually overflows (breaking aio_close). > - if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++) > + if (pthread_rwlock_tryrdlock(&maplock)) { > + /* Obtaining lock may fail if _Fork was called nor via s/nor/not/ > + * fork. In this case, no further aio is possible from > + * child and we can just null out map so __aio_close > + * does not attempt to do anything. */ > + map = 0; > + return; > + } > + if (map) for (int a=0; a<(-1U/2+1)>>24; a++) > if (map[a]) for (int b=0; b<256; b++) > if (map[a][b]) for (int c=0; c<256; c++) > if (map[a][b][c]) for (int d=0; d<256; d++) > map[a][b][c][d] = 0; > - pthread_rwlock_unlock(&maplock); > + /* Re-initialize the rwlock rather than unlocking since there > + * may have been more than one reference on it in the parent. > + * We are not a lock holder anyway; the thread in the parent was. */ > + pthread_rwlock_init(&maplock, 0); > } > diff --git a/src/process/_Fork.c b/src/process/_Fork.c > index da063868..fb0fdc2c 100644 > --- a/src/process/_Fork.c > +++ b/src/process/_Fork.c > @@ -14,7 +14,6 @@ pid_t _Fork(void) > pid_t ret; > sigset_t set; > __block_all_sigs(&set); > - __aio_atfork(-1); > LOCK(__abort_lock); > #ifdef SYS_fork > ret = __syscall(SYS_fork); > @@ -32,7 +31,7 @@ pid_t _Fork(void) > if (libc.need_locks) libc.need_locks = -1; > } > UNLOCK(__abort_lock); > - __aio_atfork(!ret); > + if (!ret) __aio_atfork(1); > __restore_sigs(&set); > return __syscall_ret(ret); > } > diff --git a/src/process/fork.c b/src/process/fork.c > index ff71845c..80e804b1 100644 > --- a/src/process/fork.c > +++ b/src/process/fork.c > @@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = { > static void dummy(int x) { } > weak_alias(dummy, __fork_handler); > weak_alias(dummy, __malloc_atfork); > +weak_alias(dummy, __aio_atfork); > weak_alias(dummy, __ldso_atfork); > > static void dummy_0(void) { } > @@ -50,6 +51,7 @@ pid_t fork(void) > int need_locks = libc.need_locks > 0; > if (need_locks) { > __ldso_atfork(-1); > + __aio_atfork(-1); > __inhibit_ptc(); > for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++) > if (*atfork_locks[i]) LOCK(*atfork_locks[i]); > @@ -75,6 +77,7 @@ pid_t fork(void) > if (ret) UNLOCK(*atfork_locks[i]); > else **atfork_locks[i] = 0; > __release_ptc(); > + if (ret) __aio_atfork(0); > __ldso_atfork(!ret); > } > __restore_sigs(&set); Looks good to me otherwise. Thanks, Alexey
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.