Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220627142109.GT7074@brightrain.aerifal.cx>
Date: Mon, 27 Jun 2022 10:21:09 -0400
From: Rich Felker <dalias@...c.org>
To: Nick Peng <pymumu@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: BUG: Calling readdir/dirfd after vfork will cause
 deadlock.

On Mon, Jun 27, 2022 at 12:05:41PM +0800, Nick Peng wrote:
> The feature I want to achieve is to close all file handles after fork to
> avoid file handles being inherited to child processes.

That's what opening them O_CLOEXEC is for. Closing *all* file
descriptors is a bug and will break correct usage like service
supervisors letting a whole process tree inherit a pipe fd so that
they can observe the extent of the lifetime of the whole tree.

> We have a component which is used in a process that uses too much memory
> (about 50BG) and uses too many file handles(about 100,000 files).
> In order to avoid file handles inheriting from child processes, we
> implemented a API like system,close all file handles after vfork.
> We can't close all file handles by iterating over all file handles number,
> because the maximum number of files limit is set very large, the
> performance of this method is too poor.
> And we can't use fork, fork will fail because of vm.overcommit_memory.
> In order to optimize this, the method we used is to read the file handes in
> /proc/self/fd directory and close the file handles one by one.

There's another way that's efficient to do this, though again it's a
bad pattern that breaks correct usage as described above: with poll()
you can query a very large set of file descriptors without requesting
any status bits, and get POLLNVAL for all the ones that are not valid.
This lets you enumerate thousands per syscall rather than just one at
a time. And it does not make use of any operations that need to
allocate memory or other resources.

> The component is based on glibc, currently this program runs for about 10
> years without deadlock.
> Recently, this component is used in musl-based embedded systems, and it
> hangs often.
> 
> After reading the vfork manual and the guidance you gave, i understand
> calling readdir after vfork is problematic. and I also learned that I can
> call getent64 to achieve this
> but what I want to say is that, based on glibc, after calling readdir after
> vfork, there is no deadlock problem at present.

musl follows the specification for vfork that *nothing* except execve
or _exit is supported after vfork. If something works, you got lucky
(or unlucky, in the sense that it might actually blow up later on
conditions you didn't anticipate, or on ugrade). For the most part,
vfork should not be used.

In practice, the above-mentioned poll method will poll will probably
work, but it is not supported.

What you're doing really calls for posix_spawn, and, if you need
further actions that it can't represent "between the fork and exec",
use of a helper program that performs those actions then execs the
final program image. This will give you the advantages of vfork
without the unsafety of it.

> more information: I googled and found this(https://lwn.net/Articles/789023/),
> probably the best solution, but it seems that the required kernel version
> is too new for my program to work with.
> 
> However, I think my problem is solved anyway, thank you all.
> 
> On Sun, Jun 26, 2022 at 2:49 AM Szabolcs Nagy <nsz@...t70.net> wrote:
> 
> > * Nick Peng <pymumu@...il.com> [2022-06-25 11:40:17 +0800]:
> > > Description:  After vfork, calling functions such as readdir/dirfd may
> > > cause deadlock. GNU C is OK.
> >
> > why do you think "GNU C is OK"? is this from some real software?
> >
> > opendir after vfork is documented to be invalid in glibc:
> >
> > https://www.gnu.org/software/libc/manual/html_mono/libc.html#Low_002dlevel-Directory-Access
> >
> > the standard is actually much stricter than the glibc manual:
> > posix conforming code must not call any libc api after vfork
> > other than _exit or the exec* familiy of functions.
> > (as-safe is not enough, but opendir is not even as-safe)
> >
> > since the example is multi-threaded even using fork would
> > be invalid, but i think both musl and glibc makes that work
> > (as an extension to the standard).
> >
> >
> > >                      Also tested on x86-64 with musl, no deadlock, but
> > > seems never exit, slower than GNU C.
> > > Version: latest, musl-1.2.3
> > > OS: debian bullseye 64bit OS. and asus router
> > > CPU: raspberrypi aarch64, mips32
> > > Reproduce Code:
> > >
> > > #include <dirent.h>
> > > #include <pthread.h>
> > > #include <stdint.h>
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <sys/types.h>
> > > #include <sys/wait.h>
> > > #include <unistd.h>
> > > #include <string.h>
> > >
> > > pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
> > > pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
> > >
> > > struct tlog_log *logs = NULL;
> > > int do_exit = 0;
> > >
> > > void *test(void *arg)
> > > {
> > >     int i = 0;
> > >
> > >     for (i = 0; i < 10000000; i++) {
> > >         char *b = malloc(4096);
> > >         memset(b, 0, 4096);
> > >         free(b);
> > >     }
> > >     do_exit = 1;
> > >     return NULL;
> > > }
> > >
> > > void lockfunc()
> > > {
> > >     char path_name[4096];
> > >     DIR *dir = NULL;
> > >     struct dirent *ent;
> > >
> > >     snprintf(path_name, sizeof(path_name), "/proc/self/fd/");
> > >     dir = opendir(path_name);
> > >     if (dir == NULL) {
> > >         goto errout;
> > >     }
> > >
> > >     while ((ent = readdir(dir)) != NULL) {
> > >     }
> > >
> > >     closedir(dir);
> > >
> > >     return;
> > > errout:
> > >     if (dir) {
> > >         closedir(dir);
> > >     }
> > >
> > >     return;
> > > }
> > >
> > > void *test_fork(void *arg)
> > > {
> > >     int count = 0;
> > >     while (do_exit == 0) {
> > >         printf("test fork count %d\n", count++);
> > >         int pid = vfork();
> > >         if (pid < 0) {
> > >             return NULL;
> > >         } else if (pid == 0) {
> > >             lockfunc();
> > >             _exit(0);
> > >         }
> > >
> > >         int status;
> > >         waitpid(pid, &status, 0);
> > >     }
> > >
> > >     return NULL;
> > > }
> > >
> > > int main(int argc, char *argv[])
> > > {
> > >     pthread_attr_t attr;
> > >     pthread_t threads[10];
> > >     pthread_t fork_test;
> > >     int i;
> > >     int ret;
> > >
> > >     pthread_attr_init(&attr);
> > >
> > >     ret = pthread_create(&fork_test, &attr, test_fork, NULL);
> > >
> > >     for (i = 0; i < 10; i++) {
> > >         ret = pthread_create(&threads[i], &attr, test, NULL);
> > >         if (ret != 0) {
> > >             return 1;
> > >         }
> > >     }
> > >
> > >     for (i = 0; i < 10; i++) {
> > >         void *retval = NULL;
> > >         pthread_join(threads[i], &retval);
> > >     }
> > >
> > >     void *retval = NULL;
> > >     pthread_join(fork_test, &retval);
> > >     printf("exit\n");
> > >     getchar();
> > >     return 0;
> > > }
> > >
> > >
> > > Log:
> > > pi@...pberrypi:~/code/tinylog/test $ ./test
> > > test fork count 0
> > > test fork count 1   <-- lock here
> > > ^C
> > >
> > > gdb backtrace:
> > > x0000000000409524 in __lock ()
> > > (gdb) bt
> > > #0  0x0000000000409524 in __lock ()
> > > #1  0x0000000000406278 in __libc_malloc_impl ()
> >

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.