musl - Re: Do we need to enhance robustness in the signal mask?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241205062851.GO10433@brightrain.aerifal.cx>
Date: Thu, 5 Dec 2024 01:28:51 -0500
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com
Subject: Re: Do we need to enhance robustness in the signal mask?

On Thu, Dec 05, 2024 at 12:56:44AM -0500, Rich Felker wrote:
> On Tue, Dec 03, 2024 at 07:18:10PM +0100, Markus Wichmann wrote:
> > Am Tue, Dec 03, 2024 at 09:44:10AM -0500 schrieb Rich Felker:
> > > There are only a limited number of ways you can get a sigset_t whose
> > > use with sigprocmask etc. doesn't have undefined behavior. Either you
> > > initialize it with sigemptyset or sigfillset then modify with
> > > sigaddset/sigdelset, or you get it from some interface that reads back
> > > a sigset_t.
> > >
> > 
> > You can get a valid sigset_t lacking SIGTIMER (unconditionally) with
> > pthread_sigmask(), and then set it, e.g like
> > 
> > sigset_t ss;
> > pthread_sigmask(SIG_SETMASK, 0, &ss);
> > pthread_sigmask(SIG_SETMASK, &ss, 0);
> > 
> > Honestly, this is close to the normal way to use pthread_sigmask(). The
> > same holds if the first all is used to block some other signal for some
> > reason and the second one then is supposed to restore the signal mask.
> > 
> > And you can do that in a timer notification function, where the signal
> > is supposed to be blocked. The timer thread will continue to execute
> > with the signal unblocked. And since the signal disposition is set to
> > SIG_DFL, and the default handling for RT signals is to terminate, the
> > next timer expiration will then kill the process.
> > 
> > > FWIW, SIGTIMER is not of special concern. It will never be generated
> > > except for timer threads, and these run with it explicitly blocked.
> > > This is very different from the situation for SIGCANCEL or SIGSYNCCALL
> > > where it's critical for arbitrary threads to be able to receive them.
> > > Also, at some point SIGTIMER is slated to be removed in favor of using
> > > clock_nanosleep for SIGEV_THREAD timers rather than kernelspace
> > > timers unless a strong reason not to do that is discovered.
> > 
> > I tried looking into doing that at some point and just gave up, because
> > of the overrun accounting. If I understand correctly, every timer
> > expiration between the notification being generated and being delivered
> > increases the overrun counter, and for each notification it is supposed
> > to be latched and then reset. So for SIGEV_THREAD, that would be every
> > expiration between the call and the return. And for SIGEV_SIGNAL, that
> > would be every expiration between the signal being sent and being
> > handled.
> > 
> > All of which was too complicated for me to wrap my head around, but
> > maybe you have more luck.
> 
> Counting the number of overruns is conceptually just a division:
> taking the difference between the current time and last notification
> time and dividing by the interval length. There is some complexity to
> handler with making changes to the interval length (special case:
> stopping and starting the timer) but it's not that hard a problem.

We're in luck! Per POSIX:

    "When a timer for which a signal is still pending expires, no
    signal shall be queued, and a timer overrun shall occur. When a
    timer expiration signal is delivered to or accepted by a process,
    the timer_getoverrun() function shall return the timer expiration
    overrun count for the specified timer. The overrun count returned
    contains the number of extra timer expirations that occurred
    between the time the signal was generated (queued) and when it was
    delivered or accepted, up to but not including an
    implementation-defined maximum of {DELAYTIMER_MAX}. If the number
    of such extra expirations is greater than or equal to
    {DELAYTIMER_MAX}, then the overrun count shall be set to
    {DELAYTIMER_MAX}. The value returned by timer_getoverrun() shall
    apply to the most recent expiration signal delivery or acceptance
    for the timer. If no expiration signal has been delivered for the
    timer, the return value of timer_getoverrun() is unspecified."

TL;DR: Overruns are only defined for SIGEV_SIGNAL. So we can just
return 0 if we like, or some reasonable interpretation.

> One short-term fix that might be worth exploring is adding back a
> signal handler for SIGTIMER so it doesn't kill the process. The
> handler would just increment an "extra overruns" counter for the
> thread. It could only run during execution of the function, if the
> function unblocked the signal, since we would re-block the signal each
> time before the next sigwaitinfo.

In particular this means no "extra overruns" fixup is required here.
The short-term fix is just adding back the no-op signal handler.
Arguably even we *should* unblock the signal during execution of the
handler function, so that an expiration that arrives while the handler
is still running doesn't cause the next handler to run late, but
instead to be skipped as an overrun, and not delay/impede subsequent
expirations.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.