Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210717103338.3085f754@vostro>
Date: Sat, 17 Jul 2021 10:33:38 +0300
From: Timo Teras <timo.teras@....fi>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: option to enable eh_frame

On Fri, 16 Jul 2021 11:57:35 -0400
Rich Felker <dalias@...c.org> wrote:

> On Fri, Jul 16, 2021 at 12:16:25PM +0300, Timo Teras wrote:
> 
> The debugger already can do debugging/unwinding because it has access
> to the debug information (if you've installed it) and there is a
> clear, understood-by-users contract that this information is not an
> inherent part of the program but something optional for external
> debugging tools only.

This apparently the fundamental difference in thinking. The debugging
information is often huge, and people don't want to install it on
production or embedded devices. It ships ton of other features which
are not needed. However, the unwind/backtrace feature does provide
significant value for error reports and profiling.

> > > Arguments to have .eh_frame:
> >  - It allows debugging things even if musl-dbg is not or cannot be
> >    installed for some reason (e.g. gdb, valgrind), or is no longer
> >    available
> >  - libunwind/libexecinfo will start to work and give meaningful
> >    backtraces  
> 
> This is explicitly a reason not to. backtrace() considered harmful.

Please elaborate. I agree it's harmful from SEGV handler when program
might be targeted. But used in other situations where it is useful. Do
you have additional reasons why backtrace() is harmful?

Do note that libunwind --enable-debug-frame is enabled default by
upstream package on ARM and aarch64, and this would work if the debug
package is installed.

This gives inconsistent behaviour based on arch and whether -dbg is
installed. Developer might think the unwind stuff works, only to find
it breaks on different install, and make the application depend on -dbg.

> >  - Continuous kernel based profiling (e.g. perf record -g dwarf)
> > will work  
> 
> This already works if you have debug info.

As said, it was desire to make it work without debug info.

> > Given that the main arguments against are either making UB crash, or
> > not the best fix, and keeping eh_frame enables useful features to
> > work, I think it would make sense to allow enabling it.
> > 
> > Please consider the the attached patch to make it a configure
> > option to enable keeping eh_frame (defaulting still to not keeping
> > it).  
> 
> You can solve this problem just as well for the things you want to
> have work by including the (part of) the debug info you want in the
> main libc.so binary: .debug_frame. Of course I can't stop Alpine from
> doing it in a different way locally, but I would strongly recommend
> you do that rather than making a contract that diverges from musl.

The .debug_frame in main is a possibility, but no one else does it.
Well, apparently some ship the full debug build always to achieve some
of the same things. I think this also achieves the same divergence from
the normal expectation.

As small side note, the -fno-*unwind-tables flags are also in "tryflags"
so if compiler would not support it, you don't get it. So depending on
compiler you might still get .eh_frame ;)

The .eh_frame is a ELF abi feature normally turned on, and usually
needs good reasons to turn it off. Could you Rich please explain in
detail all the reasons why you think it's so evil? Mostly I heard
"feature X is evil and people should not use it" and I find that
your personal opinion, not a valid technical reason. But listening
more and earlier things, I've heard:

1. Having it makes it seem an application can jump through C library
   from e.g. qsort callback. Not having .eh_frame makes C++ exception
   fail, but does not help with e.g. setjmp/longjmp. So this is not
   complete solution.

2. Explicitly disabling backtrace() universally. I assume this is to
   disable to widespread usage of it from SEGV handler. But there
   seem to be also implementations reading .debug_frame, so this does
   not really fix that either fully. And I think there are valid use
   cases for backtrace() also, e.g. when having debug build and having
   internal debug printing trigger it.

3. Adds little bit if size to the binary. And also the runtime memory
   usage as it goes to LOAD segment. But 80kB is not a concern on
   modern environment even on embedded platforms.

4. You want to still enforce this so people don't do 1 or 2.
   For this, time has taught me, people need education, not enforcement.
   If you don't teach them, they'll still go and do the same stupid
   things even more stupid ways.

Please add in any reasons I may have missed. I would like to have your
complete list of reasons why to remove .eh_frame. So far it has been
coming out in pieces. I'd like constructive discussion if some of these
items could be implemented stronger in other means than removing
.eh_frame.

Please also explain why you think the .eh_frame is sufficient even
though it does not cover various situations as explained above.

Now it may have been little bit early for me to go ahead and merge the
change in Alpine before going through this discussion. The intention
was not upset anyone or be unilateral. This is why I made the PR and
waited for a good while for comments. Unfortunately, I did not find the
comments convincing, but granted I should have asked the above questions
before merging.

Currently, I still think reasons to have .eh_frame than not are more
weighty - it enables valid use cases in expected way. But I'm also
willing to revert if there's stronger reasons to do it. Either
technical, or just keeping peace. So far the technical reasons do not
convince me, but keeping peace and good relationship is something I
definitely want.

I personally did not see this as an ABI or a "contract" change, and this
is perhaps why I made the merge earlier than later.

While the "contract" may *seem* somewhat different. It was still
mentioned in the commit message that the change was not made to make
C++ exception through C library work. The "contract promise" has not
changed; though, granted, the user perceived "de facto contract" has
changed in this case.

As said, for me the primary reason is observability: being able to gdb
and get back traces, and continuous profiling work without debug info.
Those are precious things to have on system where debug info is simply
not available. As mentioned backtrace() working is a side effect, and I
think it has also valid use cases even if it's also often misused.

But fundamentally I think if we ship .debug_frame, all people wanting
do backtrace() and the silly stuff, will just enable .debug_frame
support in their code and still do the silly stuff in worse way, than
if .eh_frame was enabled. Since technically they are the same, even if
the intended use case is different. As seen libunwind does have
--enable-debug-frame. 

For me it's quite unusual to have .debug_frame but not other debug
sections. I assume it works. But the tooling certain does not endorse
this setup. I suppose it can be achieved with some objcopy magic, but
requires care.

Timo

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.