|
Message-ID: <CAFjbc8HkQzVQ+Wwxe8PyR5mX4OkdWvPUNZeQf7jsWjhPkEE3kA@mail.gmail.com>
Date: Sun, 8 May 2022 15:23:29 +0100
From: Pablo Galindo Salgado <pablogsal@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: Markus Wichmann <nullplan@....net>, musl@...ts.openwall.com
Subject: Re: Why the entries in the dynamic section are not always relocated?
Thanks for all the answers to this! Here are some clarifications
and context.
> It appears to me that whatever you are trying to do is not possible
> portibly on Linux at this time. Could you fill us in?
As part of writing profiling and debugging tools, I am trying to rewrite
the PLT
table to hook into some symbols of shared libraries. This technique is
quite common
and is already used in a considerable number of debuggers, profilers and
elf inspection
tools. Currently, the way this is handled is "not at all" or "checking
against the base
address and heuristically assuming that is an offset if the address is less
than the base",
which is suboptimal. This use case may sound "advanced" or "hacky" but this
is quite a
common technique for doing profilers, debuggers, state inspection tools and
other related
tooling.
Notice that the lack of anything predictable here makes these tools be more
unreliable
across libc implementations (most people assume it "works" based on what
glibc does
but even old glibcs seem to be inconsistent with this).
Apart from some advanced profiling/debugger use cases, I think there are
several important
use cases here that would benefit from some way to handle this at runtime.
For instance,
inspecting the string tables and symbol tables and other entries in the
dynamic section.
Here are (some) examples of software dealing with this problem in the wild:
https://github.com/ClickHouse/ClickHouse/blob/8513f20cfded839032795978a2ffb8ef1fc6d61b/src/Common/SymbolIndex.cpp#L163
https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/dump.c#L850
There are many, many more examples of tools that are not aware of this
incompatibility and are doing it wrong. Just some examples
of this:
https://github.com/kubo/plthook/blob/fa0267b29e989e310c2594afa095cf697ea09da0/plthook_elf.c#L548-L555
https://github.com/KDE/heaptrack/blob/d9c51f3f76d7a37348020d3aead651f5301f8ea7/src/track/heaptrack_inject.cpp#L317
https://gist.github.com/aeppert/0b1a38d4364e2863d27a8a0ce2c97dc8
https://course.ccs.neu.edu/cs7680sp17/elf-parser/util-plugin.c.txt
(and many more).
I think there is value on having some way to programmatically efficiently
know how to interpret these addresses. At the very least,
allowing these tools to work correctly on muslc without even more hacks on
top.
Thanks for your consideration!
On Sun, 8 May 2022 at 14:54, Rich Felker <dalias@...c.org> wrote:
> On Sun, May 08, 2022 at 01:39:10PM +0200, Markus Wichmann wrote:
> > On Sun, May 08, 2022 at 08:48:29AM +0100, Pablo Galindo Salgado wrote:
> > > Why is this happening?
> >
> > The easy question first: This is happening because glibc finds some
> > value in writing the actual addresses into the dynamic section, and musl
> > does not. All of the addresses given in the dynamic section must
> > necessarily be offsets into the library itself (rather, the run-time map
> > of the library), so anyone who knows the base address of the library can
> > interpret these values, anyway.
>
> That's basically it. musl does not do this mainly because it's not
> possible in general -- on some archs _DYNAMIC is in read-only memory
> -- and we generally avoid arch-specific behavior in the dynamic
> linker. The only part of _DYNAMIC we modify, on archs where it's
> allowed, is DT_DEBUG, because that's a (nasty, should be replaced)
> interface with debuggers to let them find things.
>
> > See, you are accessing an implementation detail here. I am not aware of
> > any documentation of dl_iterate_phdr() which says whether the dynamic
> > section is relocated or not. Which leads directly to:
>
> It's not so much in the scope of dl_iterate_phdr, but in the runtime
> contents of ELF data structures. There are specs on *some* of that,
> but they are not among the list of standards musl purports to conform
> to (and for example some things like handling of RPATH/RUNPATH
> intentionally differ from legacy behaviors here).
>
> > > How can one programmatically know when the linker is
> > > going to place here offsets or full
> > > relocated addresses?
> >
> > In general, you cannot. You could reconstruct the length of the library
> > mapping from the LOAD headers, then heuristically assume that any value
> > below that is an offset, and any value above it probably a pointer.
> > Doesn't help you far, though, since you also need the base address.
> > Though I suppose you could assume that the start of the page the PHDRs
> > start on is likely the base of the library mapping.
> >
> > Also, the heuristic will fail for libraries mapped to a low address. In
> > theory, all address space after the zero page is fair game, right? But
> > libraries can take more space than that.
> >
> > And God help you if you ever run into an FDPIC architecture.
> >
> > It appears to me that whatever you are trying to do is not possible
> > portibly on Linux at this time. Could you fill us in?
>
> Indeed, this is probably either an XY problem with a simple portable
> way to achieve whatever the underlying goal is, or a glorious hack
> that's making a lot more assumptions about implementation internals
> and not something you'd be able to rely on continuing to work in the
> future, even if you got it working.
>
> Rich
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.