oss-security - Re: backtrace_symbols() misuse by Ceph and its supposedly-safe use

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6691B8FB.1040403@gmail.com>
Date: Fri, 12 Jul 2024 18:15:07 -0500
From: Jacob Bachmeyer <jcb62281@...il.com>
To: oss-security@...ts.openwall.com
Subject: Re: backtrace_symbols() misuse by Ceph and its supposedly-safe
 use

Alexander Patrakov wrote:
> [...]
> What would be a good solution (as in: something that does not convert
> crashes into deadlocks) here? I understand that, after memory
> corruption, we are already in the UB territory, but is there anything
> better possible than what is implemented?

I would suggest a monitor daemon that runs GDB to get the backtrace.  
The simplest way to do this would require Ceph to have its own 
supervisor (not unique; PostgreSQL has long had a "postmaster" process 
that manages the worker "postgres" backend processes) and provide each 
daemon with a pipe back to the supervisor; the fatal error handler need 
only write(2) to the pipe from a static string and/or fixed buffer (to 
report a signal number) and then enter an infinite loop; the supervisor 
then kills the crashed process, possibly after attaching GDB and 
collecting a backtrace.

Alternately, simply run the Ceph daemons with `ulimit -c` nonzero and 
collect the core files.  The core files can be analyzed using GDB after 
the fact.  No dedicated supervisor needed here, only kernel facilities.

The central problem here, as I understand it, is trying to do too much 
in a process that has gone into undefined behavior.  Attaching GDB or 
dumping a core file both sidestep that problem.

-- Jacob

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.