musl - Re: Initial xtensa/fdpic port review

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMo8BfKS=jcOkcZFEFkgwJJ2oJ4jMODn-Mg-38MmH2FOzzR1kg@mail.gmail.com>
Date: Wed, 28 Feb 2024 13:28:39 -0800
From: Max Filippov <jcmvbkbc@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Initial xtensa/fdpic port review

On Wed, Feb 28, 2024 at 10:30 AM Rich Felker <dalias@...c.org> wrote:
>
> On Wed, Feb 28, 2024 at 09:20:33AM -0800, Max Filippov wrote:
> > Hi Rich,
> >
> > On Tue, Feb 27, 2024 at 4:12 PM Rich Felker <dalias@...c.org> wrote:
> > >
> > > First of all, I'm excited to see new work on this and especially the
> > > FDPIC ABI. Not only is xtensa a very relevant arch I've been hoping we
> > > could support for some time now, but having FDPIC will help getting
> > > the parts of musl's FDPIC support that are currently making
> > > sh-specific assumptions worked out so that we can get other new FDPIC
> > > ports added too (particularly, arm and riscv).
> > >
> > > I'm not up-to-date on what toolchain status/stability is or how close
> > > this is to being intended as upstreamable, but since I was pointed to
> > > it and took a look, I wanted to record my initial review thoughts here
> > > so they don't get lost and to hopefully move this closer to ready for
> > > upstream.
> >
> > Thanks for your review.
> > I expect to get toolchain components into shape for upstreaming during
> > this year.
> >
> > > Review follows inline:
> > >
> > > On Tue, Feb 27, 2024 at 06:24:31PM -0500, Rich Felker wrote:
> > > > Attached patches were pulled from
> > > > https://github.com/jcmvbkbc/musl-xtensa/commits/xtensa-1.2.4-fdpic/
> > > >
> > > > I'll reply to comment inline.
> > >
> > > > >From 868b34272f414323f10528d5b9988886541cb1c0 Mon Sep 17 00:00:00 2001
> > > > From: Max Filippov <jcmvbkbc@...il.com>
> > > > Date: Tue, 22 Mar 2016 02:35:58 +0300
> > > > Subject: [PATCH 1/2] xtensa: add port
> > > >
> > > > Signed-off-by: Max Filippov <jcmvbkbc@...il.com>
> > > > ---
> > > >  arch/xtensa/arch.mak                  |   1 +
> > > >  arch/xtensa/atomic_arch.h             |  27 ++
> > > >  arch/xtensa/bits/alltypes.h.in        |  27 ++
> > > >  arch/xtensa/bits/float.h              |  16 ++
> > > >  arch/xtensa/bits/ioctl.h              | 219 ++++++++++++++
> > > >  arch/xtensa/bits/limits.h             |   1 +
> > > >  arch/xtensa/bits/mman.h               |  68 +++++
> > > >  arch/xtensa/bits/posix.h              |   2 +
> > > >  arch/xtensa/bits/reg.h                |   2 +
> > > >  arch/xtensa/bits/setjmp.h             |   5 +
> > > >  arch/xtensa/bits/signal.h             |  88 ++++++
> > > >  arch/xtensa/bits/stat.h               |  24 ++
> > > >  arch/xtensa/bits/stdint.h             |  20 ++
> > > >  arch/xtensa/bits/syscall.h.in         | 396 ++++++++++++++++++++++++++
> > > >  arch/xtensa/bits/termios.h            | 167 +++++++++++
> > > >  arch/xtensa/bits/user.h               |   4 +
> > > >  arch/xtensa/bits/xtensa-config.h      |   8 +
> > > >  arch/xtensa/crt_arch.h                |  48 ++++
> > > >  arch/xtensa/kstat.h                   |  18 ++
> > > >  arch/xtensa/pthread_arch.h            |  11 +
> > > >  arch/xtensa/reloc.h                   |  38 +++
> > > >  arch/xtensa/syscall_arch.h            | 104 +++++++
> > > >  configure                             |   1 +
> > > >  crt/xtensa/crti.S                     |  29 ++
> > > >  crt/xtensa/crtn.S                     |  23 ++
> > > >  include/elf.h                         |  70 +++++
> > > >  src/internal/xtensa/syscall.S         |  25 ++
> > > >  src/ldso/xtensa/tlsdesc.S             |  36 +++
> > > >  src/process/xtensa/vfork.S            |  21 ++
> > > >  src/setjmp/xtensa/longjmp.S           |  22 ++
> > > >  src/setjmp/xtensa/setjmp.S            |  25 ++
> > > >  src/signal/xtensa/restore.s           |  10 +
> > > >  src/signal/xtensa/sigsetjmp.S         |  24 ++
> > > >  src/thread/xtensa/__set_thread_area.S |  16 ++
> > > >  src/thread/xtensa/__unmapself.S       |  13 +
> > > >  src/thread/xtensa/clone.S             |  46 +++
> > > >  src/thread/xtensa/syscall_cp.S        |  38 +++
> > > >  37 files changed, 1693 insertions(+)
> > > >  create mode 100644 arch/xtensa/arch.mak
> > > >  create mode 100644 arch/xtensa/atomic_arch.h
> > > >  create mode 100644 arch/xtensa/bits/alltypes.h.in
> > > >  create mode 100644 arch/xtensa/bits/float.h
> > > >  create mode 100644 arch/xtensa/bits/ioctl.h
> > > >  create mode 100644 arch/xtensa/bits/limits.h
> > > >  create mode 100644 arch/xtensa/bits/mman.h
> > > >  create mode 100644 arch/xtensa/bits/posix.h
> > > >  create mode 100644 arch/xtensa/bits/reg.h
> > > >  create mode 100644 arch/xtensa/bits/setjmp.h
> > > >  create mode 100644 arch/xtensa/bits/signal.h
> > > >  create mode 100644 arch/xtensa/bits/stat.h
> > > >  create mode 100644 arch/xtensa/bits/stdint.h
> > > >  create mode 100644 arch/xtensa/bits/syscall.h.in
> > > >  create mode 100644 arch/xtensa/bits/termios.h
> > > >  create mode 100644 arch/xtensa/bits/user.h
> > > >  create mode 100644 arch/xtensa/bits/xtensa-config.h
> > > >  create mode 100644 arch/xtensa/crt_arch.h
> > > >  create mode 100644 arch/xtensa/kstat.h
> > > >  create mode 100644 arch/xtensa/pthread_arch.h
> > > >  create mode 100644 arch/xtensa/reloc.h
> > > >  create mode 100644 arch/xtensa/syscall_arch.h
> > > >  create mode 100644 crt/xtensa/crti.S
> > > >  create mode 100644 crt/xtensa/crtn.S
> > > >  create mode 100644 src/internal/xtensa/syscall.S
> > > >  create mode 100644 src/ldso/xtensa/tlsdesc.S
> > > >  create mode 100644 src/process/xtensa/vfork.S
> > > >  create mode 100644 src/setjmp/xtensa/longjmp.S
> > > >  create mode 100644 src/setjmp/xtensa/setjmp.S
> > > >  create mode 100644 src/signal/xtensa/restore.s
> > > >  create mode 100644 src/signal/xtensa/sigsetjmp.S
> > > >  create mode 100644 src/thread/xtensa/__set_thread_area.S
> > > >  create mode 100644 src/thread/xtensa/__unmapself.S
> > > >  create mode 100644 src/thread/xtensa/clone.S
> > > >  create mode 100644 src/thread/xtensa/syscall_cp.S
> > > >
> > > > diff --git a/arch/xtensa/arch.mak b/arch/xtensa/arch.mak
> > > > new file mode 100644
> > > > index 00000000..aa4d05ce
> > > > --- /dev/null
> > > > +++ b/arch/xtensa/arch.mak
> > > > @@ -0,0 +1 @@
> > > > +COMPAT_SRC_DIRS = compat/time32
> > > > diff --git a/arch/xtensa/atomic_arch.h b/arch/xtensa/atomic_arch.h
> > > > new file mode 100644
> > > > index 00000000..34bf0704
> > > > --- /dev/null
> > > > +++ b/arch/xtensa/atomic_arch.h
> > > > @@ -0,0 +1,27 @@
> > > > +#include "bits/xtensa-config.h"
> > >
> > > This is not what bits headers are for; they are public-installed
> > > headers that provide parts of the standard public headers that vary by
> > > arch, and are never directly #include'able.
> > >
> > > If you really need arch-internal headers like this just put them in
> > > the top-level dir of the arch. But in the case of these macros, either
> > > just use the compiler-provided, __-prefixed ones directly, or if
> > > they're not actually variable, don't inspect them at all.
> >
> > I carry this header from the time when configuration-specific header
> > file was the only available configuration option. __XCHAL_* macros
> > only appeared in gcc-13, but since FDPIC support will only be
> > available on toolchains that have these macros I guess I'll drop the
> > header and use __XCHAL_* macros directly.
> >
> > > > +
> > > > +#if XCHAL_HAVE_S32C1I
> > > > +#define a_cas a_cas
> > > > +static inline int a_cas(volatile int *p, int t, int s)
> > > > +{
> > > > +     __asm__ __volatile__ (
> > > > +             "       wsr     %2, scompare1\n"
> > > > +             "       s32c1i  %0, %1\n"
> > > > +             : "+a"(s), "+m"(*p)
> > > > +             : "a"(t)
> > > > +             : "memory" );
> > > > +        return s;
> > > > +}
> > > > +#endif
> > >
> > > For example this is not a useful test, because there's nothing you can
> > > do in the case where it's not defined;
> >
> > This test is in the spirit of xtensa: it covers one possible
> > configuration option.
> > There's at least one other hardware option (exclusive access instructions)
> > available for this functionality and a fallback to syscalls. They're not listed
> > here because I tried to keep this initial implementation to a bare minimum.
> >
> > > it's just not supportable
> > > (unless there's some kernel fallback mechanism like ARM and SH have;
> > > then it would make sense to hard-code the cas instruction as above if
> > > the ISA level is known to have it, and otherwise do conditional
> > > runtime selection (again, like ARM and SH).
> >
> > Xtensa is quite limited on the runtime selection, because opcodes belonging
> > to the options not enabled for a particular core just would not assemble.
>
> That's not really a problem. On sh, we use .word to encode the
> instructions that might not be present at the ISA level targeted,
> which will only be conditionally executed. On ARM, I believe we just
> asm directives to suppress the ISA level tagging of the object file so
> that it can use the mnemonics for the conditionally-available
> instructions. Generally, we find a way to work around whatever the
> tooling is doing to make things difficult here. :-)

On xtensa there's no guarantee that a particular byte sequence represents
the same instruction in different configurations.

> > > > diff --git a/arch/xtensa/bits/alltypes.h.in b/arch/xtensa/bits/alltypes..h.in
> > > > new file mode 100644
> > > > index 00000000..21221ce5
> > > > --- /dev/null
> > > > +++ b/arch/xtensa/bits/alltypes.h.in
> > > > @@ -0,0 +1,27 @@
> > > > +#define _REDIR_TIME64 1
> > > > +#define _Addr int
> > > > +#define _Int64 long long
> > > > +#define _Reg int
> > > > +
> > > > +#if __XTENSA_EB__
> > > > +#define __BYTE_ORDER 4321
> > > > +#elif __XTENSA_EL__
> > > > +#define __BYTE_ORDER 1234
> > > > +#else
> > > > +#error Unknown endianness
> > > > +#endif
> > > > +
> > > > +#define __LONG_MAX 0x7fffffffL
> > > > +
> > > > +#ifndef __cplusplus
> > > > +#ifdef __WCHAR_TYPE__
> > > > +TYPEDEF __WCHAR_TYPE__ wchar_t;
> > > > +#else
> > > > +TYPEDEF unsigned wchar_t;
> > > > +#endif
> > > > +#endif
> > > > +
> > > > +TYPEDEF float float_t;
> > > > +TYPEDEF double double_t;
> > > > +
> > > > +TYPEDEF struct { long __l __attribute__((aligned(__BIGGEST_ALIGNMENT__))); } max_align_t;
> > >
> > > It's best to avoid macros like __BIGGEST_ALIGNMENT__ here that could
> > > change out from under us in the future, thereby quietly breaking the
> > > ABI. Whatever the ABI is, that's what the alignment here should be,
> > > and just using a type that naturally has the alignment if possible
> > > rather than attributes.
> >
> > The thing is: we don't know the name of the type with the widest
> > alignment, as indeed in the future someone may come up with
> > a configuration with an arbitrarily named type with an alignment
> > as big as allowed physically. But then its their responsibility to
> > keep the __BIGGEST_ALIGNMENT__ consistent across the
> > updates of that configuration that intend to stay compatible with
> > each other.
>
> max_align_t is only a C-language contract for the standard C types.
> There is no need for it to cover any sort of extension types, vectors,
> etc. So it should be defined in terms of the C ABI types and needs to
> be stable. If long long or double is already 8-byte aligned and long
> double is the same as double, it should probably just be defined as
> long long or a union of long long and long double or similar.

Ok, somehow I missed that this name comes from the standard.

> > > > diff --git a/arch/xtensa/reloc.h b/arch/xtensa/reloc.h
> > > > new file mode 100644
> > > > index 00000000..03482f18
> > > > --- /dev/null
> > > > +++ b/arch/xtensa/reloc.h
> > > > @@ -0,0 +1,38 @@
> > > > +#if __BYTE_ORDER == __BIG_ENDIAN
> > > > +#define ENDIAN_SUFFIX "eb"
> > > > +#else
> > > > +#define ENDIAN_SUFFIX ""
> > > > +#endif
> > > > +
> > > > +#if __FDPIC__
> > > > +#define ABI_SUFFIX "-fdpic"
> > > > +#else
> > > > +#define ABI_SUFFIX ""
> > > > +#endif
> > > > +
> > > > +#define LDSO_ARCH "xtensa" ENDIAN_SUFFIX ABI_SUFFIX
> > >
> > > Any actual incompatible ABI (as opposed to ISA level) needs its own
> > > LDSO name.
> >
> > There's no real ISA levels in the xtensa. Every CPU core configuration
> > is its own unique ISA.
>
> What does that practically mean? I think we're using different
> terminology which is making it hard to communicate.
>
> By "ISA level" I mean a node of a tree of possible extensions to a
> baseline ISA, where machine code targeted for one node in the tree can
> be executed on all descendants, and can be linked with machine code
> targeting a particular descendant to produce code that can be executed
> on that descendant.
>
> This applies both statically and dynamically. For example, an
> application built for a higher ISA level can be executed with
> libc/ldso built for the baseline ISA (note: this sometimes requires
> conditional logic in a few places like setjmp; see arm for example),
> and an application built for the baseline ISA can be executed using
> libc/ldso built for a higher ISA level.
>
> I'm guessing you mean which features are supported is combinatorically
> selectable as part of the core configuration, not a linear progression
> of levels like i386, i486, etc.

That's correct. There seems to be no intention that different xtensa core
configurations were binary compatible with each other. The development
model that I know only supports compatibility between a particular
configuration created with one version of the xtensa tools and rebuilds of
that configuration with later versions of the xtensa tools. I.e. each specific
configuration is its own ISA, binary software compatibility between the
different configurations is not guaranteed. Software tools for one
configuration need not generate or understand the correct binary code
for any other configuration.

> That's no problem. If I'm mistaken,
> plese help me try to understand what you mean.
>
> > > So if there are alternate (call0/windowed?) ABIs you want
> > > to support, different names need to be defined for them here and in
> > > configure.
> >
> > I understand that it means that I should drop the endianness suffix,
> > as the endianness of each core is fixed, but add windowed/call0
> > discriminator.
>
> If there are both le and be variants, then they need to have separate
> names.

They can never exist both at the same time for any particular core
configuration. Every core has fixed endianness, there's no standard
way to generate the "other endian" code for a given core.

> Same for any other mutually incompatible ABIs.
>
> > > Unless there's a good reason to support more than one ABI,
> > > it probably just makes sense to pick the preferred one and have
> > > configure error-out if the toolchain's ABI is incompatible.
> >
> > I still have hope to support windowed FDPIC and then non-FDPIC
> > variants.
>
> Is there a reason to support both windowed and non-windowed? Sorry if
> this is a dumb question; I don't know the background on it.

Windowed is a bit more performant, and it produces more compact
code, but it may not be available if the windowed registers option was
not enabled for the particular xtensa core and it's harder to deal with
from the software point of view.

> If it's
> possible to just pick one as better, and it can be supported on all
> cores, that's probably the better choice.

That might be call0, but then all transistors spent on windowed
registers were wasted.

> > > > diff --git a/src/ldso/xtensa/tlsdesc.S b/src/ldso/xtensa/tlsdesc.S
> > > > new file mode 100644
> > > > index 00000000..084d5f3e
> > > > --- /dev/null
> > > > +++ b/src/ldso/xtensa/tlsdesc.S
> > > > @@ -0,0 +1,36 @@
> > > > +.global __tlsdesc_static
> > > > +.hidden __tlsdesc_static
> > > > +.type __tlsdesc_static,@function
> > > > +.align 4
> > > > +__tlsdesc_static:
> > > > +#ifdef __XTENSA_WINDOWED_ABI__
> > > > +     entry   a1, 16
> > > > +#endif
> > > > +     rur     a3, threadptr
> > > > +     add     a2, a2, a3
> > > > +#if defined(__XTENSA_WINDOWED_ABI__)
> > > > +     retw
> > > > +#elif defined(__XTENSA_CALL0_ABI__)
> > > > +     ret
> > > > +#else
> > > > +#error Unsupported Xtensa ABI
> > > > +#endif
> > >
> > > Are you sure the entry step makes sense with tlsdesc? Is this honoring
> > > the ABI for what the tlsdesc function is allowed to clobber?
> >
> > But the tlsdesc function is just a normal function, it can do whatever
> > a regular function can do AFAIU. In addition there are no relaxations
> > that can replace non-call code with a variant with a call.
> >
> > 'entry' is the right way to begin a regular function in windowed xtensa ABI..
>
> OK but this is not a regular function. It's a code fragment with
> special ABI constraints and is intended to execute as fast as
> possible, clobbering as little as possible.

I understand that, but I'm looking at the way it is implemented
in the xtensa gcc port and see an ordinary function call:
  https://github.com/jcmvbkbc/gcc-xtensa/blob/xtensa-14-8789-fdpic/gcc/config/xtensa/xtensa.md#L3026

> I don't understand exactly
> what "entry" does, but I suspect it only makes sense if you're
> actually introducing a "call frame" of some sort, like if you'll be
> spilling and restoring call-saved registers or something (or doing
> whatever the windowed-register-file equivalent is to achieve the same
> thing) which does not seem to be needed here.

Unfortunately windowed calls on xtensa LX can be made with three
different window sizes: 4, 8 or 12 registers, the first outgoing argument
goes in the register a6, a10 or a14 respectively. One thing that 'entry'
does is window rotation so that the first incoming argument appears
in the register a2. AFAIK there's no other way to access arguments in
the unprivileged code.

> > > > +.hidden __tls_get_new
> > > > +
> > > > +.global __tlsdesc_dynamic
> > > > +.hidden __tlsdesc_dynamic
> > > > +.type __tlsdesc_dynamic,@function
> > > > +.align 4
> > > > +__tlsdesc_dynamic:
> > > > +#if defined(__XTENSA_WINDOWED_ABI__)
> > > > +     entry   a1, 16
> > > > +     mov     a6, a2
> > > > +     call4   __tls_get_addr
> > > > +     mov     a2, a6
> > > > +     retw
> > > > +#elif defined(__XTENSA_CALL0_ABI__)
> > > > +     j       __tls_get_addr
> > > > +#else
> > > > +#error Unsupported Xtensa ABI
> > > > +#endif
> > >
> > > This approach is no longer a valid implementation for
> > > __tlsdesc_dynamic, because it calls C code which could clobber any
> > > call-clobbered registers, not just the ones the tlsdesc function is
> > > allowed to clobber.
> > >
> > > Instead, it needs to actually do the dtv lookup in asm. musl does not
> > > use any dynamic allocation here, so the lookup is guaranteed to just
> > > work using the indices found with no conditional branches. See the
> > > recently added riscv code for a clean example, in commit
> > > 407aea628af8c81d9e3f5a068568f2217db71bba.
> >
> > Yeah, this whole xtensa TLS code should be in the WIP pile.
> > I'll rewrite it in assembly.
>
> OK, just wanted you to be aware that this is something eventually
> needing rewrite.

Sure.

> > > > diff --git a/src/signal/xtensa/sigsetjmp.S b/src/signal/xtensa/sigsetjmp.S
> > > > new file mode 100644
> > > > index 00000000..e44bba88
> > > > --- /dev/null
> > > > +++ b/src/signal/xtensa/sigsetjmp.S
> > > > @@ -0,0 +1,24 @@
> > > > +.global sigsetjmp
> > > > +.global __sigsetjmp
> > > > +.type sigsetjmp,@function
> > > > +.type __sigsetjmp,@function
> > > > +.align 4
> > > > +sigsetjmp:
> > > > +__sigsetjmp:
> > > > +#if defined(__XTENSA_CALL0_ABI__)
> > > > +     bnez    a3, 1f
> > > > +.hidden ___setjmp
> > > > +     j       ___setjmp
> > > > +1:
> > > > +     mov     a3, a0
> > > > +     mov     a4, a2
> > > > +     call0   ___setjmp
> > > > +     mov     a0, a3
> > > > +     mov     a3, a2
> > > > +     mov     a2, a4
> > > > +
> > > > +.hidden __sigsetjmp_tail
> > > > +     j       __sigsetjmp_tail
> > > > +#else
> > > > +#error Unsupported Xtensa ABI
> > > > +#endif
> > >
> > > I don't think this code works. It does not seem to save the return
> > > address and argument address before calling ___setjmp, so they will be
> > > clobbered after it returns.
> >
> > It does:
> >
> > +     mov     a3, a0
> > +     mov     a4, a2
> >
> > do exactly this and ___setjmp is hidden, so it's always going to be
> > the implementation above that doesn't clobber a3 and a4.
>
> Nope, that's not how it works. setjmp returns twice, and the second
> time it returns, it's *whatever code has executed since the first
> return from setjmp, which most certainly will have clobbered
> everything but the call-saved registers.

Of course.

> The return address needs to be saved in the space past the end of the
> signal mask in the sigjmp_buf, and at least one call-saved register
> also needs to be saved there, so that you can store the address of the
> sigjmp_buf in a call-saved register and have it available after a
> subsequent return from setjmp.

I will.

> > > Eventually I'd like to move most functions like this, that are
> > > currently external asm but have no good reason to be (unlike setjmp,
> > > vfork, etc. which *do* have a good reason to be), to minimal inline
> > > asm in C. Some archs' versions of this file would even just become
> > > __syscall, no further asm.
> > >
> > > > diff --git a/src/thread/xtensa/__unmapself.S b/src/thread/xtensa/__unmapself.S
> > > > new file mode 100644
> > > > index 00000000..15bf2bdf
> > > > --- /dev/null
> > > > +++ b/src/thread/xtensa/__unmapself.S
> > > > @@ -0,0 +1,13 @@
> > > > +.global __unmapself
> > > > +.type   __unmapself,@function
> > > > +.align 4
> > > > +__unmapself:
> > > > +#if defined(__XTENSA_CALL0_ABI__)
> > > > +     mov     a6, a2
> > > > +     movi    a2, 81 # SYS_munmap
> > > > +     syscall
> > > > +     movi    a2, 118 # SYS_exit
> > > > +     syscall
> > > > +#else
> > > > +#error Unsupported Xtensa ABI
> > > > +#endif
> > >
> > > This is unsafe on fdpic/nommu if the kernel requires the task to
> > > always have a valid stack.
> >
> > How do I know if the kernel requires that?
>
> I don't think it's explicitly documented. For sh I was vaguely
> familiar at the ISA level that the sh1/sh2 cpu just pushes stuff to
> the (existing, userspace) stack pointer on interrupts, which would
> necessitate always having a valid stack.

Oh, I know that part well, xtensa doesn't have anything like that.

> I also read the entry.S code
> (or whatever file this part was split off into) for how kernelspace is
> entered from traps/interrupts on nommu hardware, and saw how it
> shuffled things around to make it look to the kernel more like a
> conventional mmu-ful entry to kernelspace.
>
> Maybe xtensa is better about this and always invokes the
> trap/interrupt handler with the saved state in special registers
> rather than pushing it to the stack. If so, this might not be a
> consideration; normally, Linux switches the stack pointer to the
> kernel stack before actually doing anything with it.

Yep, that's what we do.

> > > > >From f8c5953ecdb948282cc8e573b729c25db60a95a8 Mon Sep 17 00:00:00 2001
> > > > From: Max Filippov <jcmvbkbc@...il.com>
> > > > Date: Wed, 21 Feb 2024 08:27:37 -0800
> > > > Subject: [PATCH 2/2] WIP
> > > >
> > > > Signed-off-by: Max Filippov <jcmvbkbc@...il.com>
> > > > ---
> > > >  ldso/dlstart.c         | 7 +++++++
> > > >  ldso/dynlink.c         | 6 ++++--
> > > >  src/internal/dynlink.h | 5 +++--
> > > >  3 files changed, 14 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/ldso/dlstart.c b/ldso/dlstart.c
> > > > index 259f5e18..beca953f 100644
> > > > --- a/ldso/dlstart.c
> > > > +++ b/ldso/dlstart.c
> > > > @@ -90,12 +90,19 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
> > > >                               - segs[rel_addr[1]].p_vaddr
> > > >                               + syms[R_SYM(rel[1])].st_value;
> > > >                       rel_addr[1] = dyn[DT_PLTGOT];
> > > > +             } else if (R_TYPE(rel[1]) == REL_RELATIVE) {
> > > > +                     size_t val = *rel_addr;
> > > > +                     for (j=0; val-segs[j].p_vaddr >= segs[j].p_memsz; j++);
> > > > +                     *rel_addr += segs[j].addr - segs[j].p_vaddr;
> > >
> > > So xtensa has a "relative" reloc type that's just adjusted by the load
> > > offset of the segment the relocation lives in, rather than needing to
> > > use a symbolic relocation referencing a section symbol like other
> > > fdpic archs do?
> >
> > I was looking at the ARM BFD code while doing that and
> > my impression was that they do the same.
> > Regardless, I wonder why either relocation form might be preferrable?
>
> I think the relative type here is perfectly acceptable. At first I
> thought it was weaker (less capable of representing addresses of
> objects in different segments), but looking again, I don't think
> that's the case.

Ok.

> > > If so, this looks right and looks ok.
> > >
> > > >               } else {
> > > >                       size_t val = syms[R_SYM(rel[1])].st_value;
> > > >                       for (j=0; val-segs[j].p_vaddr >= segs[j].p_memsz; j++);
> > > >                       *rel_addr = rel[2] + segs[j].addr - segs[j].p_vaddr + val;
> > > >               }
> > > >       }
> > > > +#ifdef __xtensa__
> > > > +     ((unsigned long *)dyn[DT_PLTGOT])[3] = segs[0].addr - segs[0].p_vaddr;
> > > > +#endif
> > >
> > > Is this actually needed for anything? Generally musl doesn't use the
> > > reserved GOT slots itself, and on all the other archs I'm aware of,
> > > they're essentially reserved to the dynamic linker implementation so
> > > the dynamic linker is just free not to use them and not to set them
> > > up.
> >
> > xtensa doesn't have relative register jumps and calls, so local jumps
> > and calls to a far off locations need to use absolute target addresses.
> > One possible solution is to have the address in the GOT entry, the
> > other is to calculate the target address using the text segment load
> > offset at runtime. Both have the same instruction count, see
> >   http://wiki.osll.ru/doku.php/etc:users:jcmvbkbc:binutils-xtensa#local_call
> > for the details, but the latter doesn't waste GOT space and that saves
> > a noticeable amount of RAM.
>
> I see. Doesn't this limit you to a single text segment, though?

It does, yes. Also there's a hardcoded 0 as a load map index for
the text segment.

> In practice it might not matter, but it's more constraining than fdpic
> was designed to be.

I guess I could add a compiler switch that would choose how it's done,
but it seems that for a noMMU linux application a single text segment
is a reasonable expectation.

> Is there a reason local, relative jumps/calls can't be done by
> computing the address of a nearby label and using the offset relative
> to that? That's how they (at least some types; I forget the details)
> work on sh/fdpic.

IIUC that would require the actual PC knowledge and there's no direct
access to the PC on xtensa.

> > > >  #else
> > > >       /* If the dynamic linker is invoked as a command, its load
> > > >        * address is not available in the aux vector. Instead, compute
> > > > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > > > index ceca3c98..25563af3 100644
> > > > --- a/ldso/dynlink.c
> > > > +++ b/ldso/dynlink.c
> > > > @@ -1420,6 +1420,7 @@ static void reloc_all(struct dso *p)
> > > >               if (!DL_FDPIC)
> > > >                       do_relr_relocs(p, laddr(p, dyn[DT_RELR]), dyn[DT_RELRSZ]);
> > > >
> > > > +#if 0
> > > >               if (head != &ldso && p->relro_start != p->relro_end) {
> > > >                       long ret = __syscall(SYS_mprotect, laddr(p, p->relro_start),
> > > >                               p->relro_end-p->relro_start, PROT_READ);
> > > > @@ -1429,6 +1430,7 @@ static void reloc_all(struct dso *p)
> > > >                               if (runtime) longjmp(*rtld_fail, 1);
> > > >                       }
> > > >               }
> > > > +#endif
> > >
> > > Was this breaking something? Relro linking probably should have been
> > > disabled on fdpic, and we ignore ENOSYS anyway for nommu.
> >
> > Yeah, I do some of the testing in linux-user QEMU, it has MMU
> > and mprotect calls actually succeed. Failures were coming from the
> > relocation code, but my impression was that relro should be applied
> > after the relocation completion and it should just work, hence the
> > WIP pile.
>
> Yes, relro is only supposed to be applied after all relocations were
> done.

Yeah, I'll have a look.

-- 
Thanks.
-- Max
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.