|
Message-ID: <20231005123858.GH4163@brightrain.aerifal.cx> Date: Thu, 5 Oct 2023 08:39:03 -0400 From: Rich Felker <dalias@...c.org> To: Markus Wichmann <nullplan@....net> Cc: musl@...ts.openwall.com Subject: Re: Hung processes with althttpd web server On Thu, Oct 05, 2023 at 05:37:41AM +0200, Markus Wichmann wrote: > Am Wed, Oct 04, 2023 at 09:41:41PM -0400 schrieb Carl Chave: > > Hello, I'm running the althttpd web server on Alpine Linux using a Ramnode VPS. > > > > I've been having issues for quite a while with "hung" processes. There > > is a long lived parent process and then a short lived forked process > > for each http request. What I've been seeing is that the forked > > processes will sometimes get stuck: > > > > sod01:/srv/www/log$ sudo strace -p 11329 > > strace: Process 11329 attached > > futex(0x7f5bdcd77900, FUTEX_WAIT_PRIVATE, 4294967295, NULL > > > > I often see this system call hung when signal handlers are doing > signal-unsafe things. Looking at the source code, that is exactly what > happens if the process catches a signal at the wrong time. Try removing > all calls to signal(); that should do what the designers intended > better (namely quit the process). If you want to log when a process dies > of unnatural causes, that's something the parent process can do. > > The signal handler will call MakeLogEntry(), and that will do > signal-unsafe things such as call free(), localtime(), or fopen(). If > the main process is currently using malloc() when that happens, you will > get precisely this hang. > > > > Please see this forum thread for additional information: > > https://sqlite.org/althttpd/forumpost/4dc31619341ce947 > > > > Seems like they haven't yet found the trail of the signal handler. OK, this is almost surely the source of the problem. It would still be interesting to know which lock is being hit here, since for the most part, locks are skipped in single-threaded processes. But even if the lock were skipped, the invalid calls to async-signal-unsafe functions from async-signal context would be corrupting the state those locks were meant to protect. That's probably what's happening on glibc (meaning this code only appears to work there, but it likely behaving dangerously). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.