Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 27 Feb 2024 19:13:04 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: Max Filippov <jcmvbkbc@...il.com>
Subject: Re: Initial xtensa/fdpic port review

First of all, I'm excited to see new work on this and especially the
FDPIC ABI. Not only is xtensa a very relevant arch I've been hoping we
could support for some time now, but having FDPIC will help getting
the parts of musl's FDPIC support that are currently making
sh-specific assumptions worked out so that we can get other new FDPIC
ports added too (particularly, arm and riscv).

I'm not up-to-date on what toolchain status/stability is or how close
this is to being intended as upstreamable, but since I was pointed to
it and took a look, I wanted to record my initial review thoughts here
so they don't get lost and to hopefully move this closer to ready for
upstream.

Review follows inline:

On Tue, Feb 27, 2024 at 06:24:31PM -0500, Rich Felker wrote:
> Attached patches were pulled from
> https://github.com/jcmvbkbc/musl-xtensa/commits/xtensa-1.2.4-fdpic/
> 
> I'll reply to comment inline.

> >From 868b34272f414323f10528d5b9988886541cb1c0 Mon Sep 17 00:00:00 2001
> From: Max Filippov <jcmvbkbc@...il.com>
> Date: Tue, 22 Mar 2016 02:35:58 +0300
> Subject: [PATCH 1/2] xtensa: add port
> 
> Signed-off-by: Max Filippov <jcmvbkbc@...il.com>
> ---
>  arch/xtensa/arch.mak                  |   1 +
>  arch/xtensa/atomic_arch.h             |  27 ++
>  arch/xtensa/bits/alltypes.h.in        |  27 ++
>  arch/xtensa/bits/float.h              |  16 ++
>  arch/xtensa/bits/ioctl.h              | 219 ++++++++++++++
>  arch/xtensa/bits/limits.h             |   1 +
>  arch/xtensa/bits/mman.h               |  68 +++++
>  arch/xtensa/bits/posix.h              |   2 +
>  arch/xtensa/bits/reg.h                |   2 +
>  arch/xtensa/bits/setjmp.h             |   5 +
>  arch/xtensa/bits/signal.h             |  88 ++++++
>  arch/xtensa/bits/stat.h               |  24 ++
>  arch/xtensa/bits/stdint.h             |  20 ++
>  arch/xtensa/bits/syscall.h.in         | 396 ++++++++++++++++++++++++++
>  arch/xtensa/bits/termios.h            | 167 +++++++++++
>  arch/xtensa/bits/user.h               |   4 +
>  arch/xtensa/bits/xtensa-config.h      |   8 +
>  arch/xtensa/crt_arch.h                |  48 ++++
>  arch/xtensa/kstat.h                   |  18 ++
>  arch/xtensa/pthread_arch.h            |  11 +
>  arch/xtensa/reloc.h                   |  38 +++
>  arch/xtensa/syscall_arch.h            | 104 +++++++
>  configure                             |   1 +
>  crt/xtensa/crti.S                     |  29 ++
>  crt/xtensa/crtn.S                     |  23 ++
>  include/elf.h                         |  70 +++++
>  src/internal/xtensa/syscall.S         |  25 ++
>  src/ldso/xtensa/tlsdesc.S             |  36 +++
>  src/process/xtensa/vfork.S            |  21 ++
>  src/setjmp/xtensa/longjmp.S           |  22 ++
>  src/setjmp/xtensa/setjmp.S            |  25 ++
>  src/signal/xtensa/restore.s           |  10 +
>  src/signal/xtensa/sigsetjmp.S         |  24 ++
>  src/thread/xtensa/__set_thread_area.S |  16 ++
>  src/thread/xtensa/__unmapself.S       |  13 +
>  src/thread/xtensa/clone.S             |  46 +++
>  src/thread/xtensa/syscall_cp.S        |  38 +++
>  37 files changed, 1693 insertions(+)
>  create mode 100644 arch/xtensa/arch.mak
>  create mode 100644 arch/xtensa/atomic_arch.h
>  create mode 100644 arch/xtensa/bits/alltypes.h.in
>  create mode 100644 arch/xtensa/bits/float.h
>  create mode 100644 arch/xtensa/bits/ioctl.h
>  create mode 100644 arch/xtensa/bits/limits.h
>  create mode 100644 arch/xtensa/bits/mman.h
>  create mode 100644 arch/xtensa/bits/posix.h
>  create mode 100644 arch/xtensa/bits/reg.h
>  create mode 100644 arch/xtensa/bits/setjmp.h
>  create mode 100644 arch/xtensa/bits/signal.h
>  create mode 100644 arch/xtensa/bits/stat.h
>  create mode 100644 arch/xtensa/bits/stdint.h
>  create mode 100644 arch/xtensa/bits/syscall.h.in
>  create mode 100644 arch/xtensa/bits/termios.h
>  create mode 100644 arch/xtensa/bits/user.h
>  create mode 100644 arch/xtensa/bits/xtensa-config.h
>  create mode 100644 arch/xtensa/crt_arch.h
>  create mode 100644 arch/xtensa/kstat.h
>  create mode 100644 arch/xtensa/pthread_arch.h
>  create mode 100644 arch/xtensa/reloc.h
>  create mode 100644 arch/xtensa/syscall_arch.h
>  create mode 100644 crt/xtensa/crti.S
>  create mode 100644 crt/xtensa/crtn.S
>  create mode 100644 src/internal/xtensa/syscall.S
>  create mode 100644 src/ldso/xtensa/tlsdesc.S
>  create mode 100644 src/process/xtensa/vfork.S
>  create mode 100644 src/setjmp/xtensa/longjmp.S
>  create mode 100644 src/setjmp/xtensa/setjmp.S
>  create mode 100644 src/signal/xtensa/restore.s
>  create mode 100644 src/signal/xtensa/sigsetjmp.S
>  create mode 100644 src/thread/xtensa/__set_thread_area.S
>  create mode 100644 src/thread/xtensa/__unmapself.S
>  create mode 100644 src/thread/xtensa/clone.S
>  create mode 100644 src/thread/xtensa/syscall_cp.S
> 
> diff --git a/arch/xtensa/arch.mak b/arch/xtensa/arch.mak
> new file mode 100644
> index 00000000..aa4d05ce
> --- /dev/null
> +++ b/arch/xtensa/arch.mak
> @@ -0,0 +1 @@
> +COMPAT_SRC_DIRS = compat/time32
> diff --git a/arch/xtensa/atomic_arch.h b/arch/xtensa/atomic_arch.h
> new file mode 100644
> index 00000000..34bf0704
> --- /dev/null
> +++ b/arch/xtensa/atomic_arch.h
> @@ -0,0 +1,27 @@
> +#include "bits/xtensa-config.h"

This is not what bits headers are for; they are public-installed
headers that provide parts of the standard public headers that vary by
arch, and are never directly #include'able.

If you really need arch-internal headers like this just put them in
the top-level dir of the arch. But in the case of these macros, either
just use the compiler-provided, __-prefixed ones directly, or if
they're not actually variable, don't inspect them at all.

> +
> +#if XCHAL_HAVE_S32C1I
> +#define a_cas a_cas
> +static inline int a_cas(volatile int *p, int t, int s)
> +{
> +	__asm__ __volatile__ (
> +		"	wsr	%2, scompare1\n"
> +		"	s32c1i	%0, %1\n"
> +		: "+a"(s), "+m"(*p)
> +		: "a"(t)
> +		: "memory" );
> +        return s;
> +}
> +#endif

For example this is not a useful test, because there's nothing you can
do in the case where it's not defined; it's just not supportable
(unless there's some kernel fallback mechanism like ARM and SH have;
then it would make sense to hard-code the cas instruction as above if
the ISA level is known to have it, and otherwise do conditional
runtime selection (again, like ARM and SH).

> diff --git a/arch/xtensa/bits/alltypes.h.in b/arch/xtensa/bits/alltypes.h.in
> new file mode 100644
> index 00000000..21221ce5
> --- /dev/null
> +++ b/arch/xtensa/bits/alltypes.h.in
> @@ -0,0 +1,27 @@
> +#define _REDIR_TIME64 1
> +#define _Addr int
> +#define _Int64 long long
> +#define _Reg int
> +
> +#if __XTENSA_EB__
> +#define __BYTE_ORDER 4321
> +#elif __XTENSA_EL__
> +#define __BYTE_ORDER 1234
> +#else
> +#error Unknown endianness
> +#endif
> +
> +#define __LONG_MAX 0x7fffffffL
> +
> +#ifndef __cplusplus
> +#ifdef __WCHAR_TYPE__
> +TYPEDEF __WCHAR_TYPE__ wchar_t;
> +#else
> +TYPEDEF unsigned wchar_t;
> +#endif
> +#endif
> +
> +TYPEDEF float float_t;
> +TYPEDEF double double_t;
> +
> +TYPEDEF struct { long __l __attribute__((aligned(__BIGGEST_ALIGNMENT__))); } max_align_t;

It's best to avoid macros like __BIGGEST_ALIGNMENT__ here that could
change out from under us in the future, thereby quietly breaking the
ABI. Whatever the ABI is, that's what the alignment here should be,
and just using a type that naturally has the alignment if possible
rather than attributes.

> diff --git a/arch/xtensa/bits/mman.h b/arch/xtensa/bits/mman.h
> new file mode 100644
> index 00000000..d2f58eb1
> --- /dev/null
> +++ b/arch/xtensa/bits/mman.h
> @@ -0,0 +1,68 @@
> +#define MAP_FAILED ((void *) -1)
> +
> +#define	PROT_NONE      0
> +#define	PROT_READ      1
> +#define	PROT_WRITE     2
> +#define	PROT_EXEC      4
> +#define	PROT_GROWSDOWN 0x01000000
> +#define	PROT_GROWSUP   0x02000000
> +
> +#define	MAP_SHARED     0x01
> +#define	MAP_PRIVATE    0x02
> +#define	MAP_FIXED      0x10
> +
> +#define MAP_TYPE       0x0f
> +#undef MAP_FILE
> +#define MAP_FILE       0x00
> +#undef MAP_ANON
> +#define MAP_ANON       0x800
> +#define MAP_ANONYMOUS  MAP_ANON
> +#undef MAP_NORESERVE
> +#define MAP_NORESERVE  0x0400
> +#undef MAP_GROWSDOWN
> +#define MAP_GROWSDOWN  0x1000
> +#undef MAP_DENYWRITE
> +#define MAP_DENYWRITE  0x2000
> +#undef MAP_EXECUTABLE
> +#define MAP_EXECUTABLE 0x4000
> +#undef MAP_LOCKED
> +#define MAP_LOCKED     0x8000
> +#undef MAP_POPULATE
> +#define MAP_POPULATE   0x10000
> +#undef MAP_NONBLOCK
> +#define MAP_NONBLOCK   0x20000
> +#undef MAP_STACK
> +#define MAP_STACK      0x40000
> +#undef MAP_HUGETLB
> +#define MAP_HUGETLB    0x80000
> +
> +#define POSIX_MADV_NORMAL       0
> +#define POSIX_MADV_RANDOM       1
> +#define POSIX_MADV_SEQUENTIAL   2
> +#define POSIX_MADV_WILLNEED     3
> +#define POSIX_MADV_DONTNEED     4
> +
> +#define MS_ASYNC        1
> +#define MS_INVALIDATE   2
> +#define MS_SYNC         4
> +
> +#define MCL_CURRENT     1
> +#define MCL_FUTURE      2
> +
> +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +#define MADV_NORMAL      0
> +#define MADV_RANDOM      1
> +#define MADV_SEQUENTIAL  2
> +#define MADV_WILLNEED    3
> +#define MADV_DONTNEED    4
> +#define MADV_REMOVE      9
> +#define MADV_DONTFORK    10
> +#define MADV_DOFORK      11
> +#define MADV_MERGEABLE   12
> +#define MADV_UNMERGEABLE 13
> +#define MADV_HUGEPAGE    14
> +#define MADV_NOHUGEPAGE  15
> +#define MADV_DONTDUMP    16
> +#define MADV_DODUMP      17
> +#define MADV_HWPOISON    100
> +#endif

A lot of these don't vary from the standard values, and including them
(especially since most will be duplicate/re- definitions) obscures
what actually is arch-specific here.

> diff --git a/arch/xtensa/bits/setjmp.h b/arch/xtensa/bits/setjmp.h
> new file mode 100644
> index 00000000..e1a6dd97
> --- /dev/null
> +++ b/arch/xtensa/bits/setjmp.h
> @@ -0,0 +1,5 @@
> +#if defined(__XTENSA_CALL0_ABI__)
> +typedef unsigned long __jmp_buf[6];
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

See notes later on reloc.h regarding ABIs.

> diff --git a/arch/xtensa/bits/signal.h b/arch/xtensa/bits/signal.h
> new file mode 100644
> index 00000000..545ffd36
> --- /dev/null
> +++ b/arch/xtensa/bits/signal.h
> @@ -0,0 +1,88 @@
> +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +
> +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +#define MINSIGSTKSZ 2048
> +#define SIGSTKSZ 8192
> +#endif
> +
> +typedef struct sigcontext
> +{
> +	unsigned long sc_pc;
> +	unsigned long sc_ps;
> +	unsigned long sc_lbeg;
> +	unsigned long sc_lend;
> +	unsigned long sc_lcount;
> +	unsigned long sc_sar;
> +	unsigned long sc_acclo;
> +	unsigned long sc_acchi;
> +	unsigned long sc_a[16];
> +	void *sc_xtregs;
> +} mcontext_t;

This needs a namespace-safe alternate definition when in
standards-conforming profile. See how it's done on most other archs.
(It can just be a dummy struct of the right size if you like.)

> diff --git a/arch/xtensa/bits/xtensa-config.h b/arch/xtensa/bits/xtensa-config.h
> new file mode 100644
> index 00000000..a2e95904
> --- /dev/null
> +++ b/arch/xtensa/bits/xtensa-config.h
> @@ -0,0 +1,8 @@
> +#undef XCHAL_NUM_AREGS
> +#define XCHAL_NUM_AREGS			__XCHAL_NUM_AREGS
> +
> +#undef XCHAL_HAVE_S32C1I
> +#define XCHAL_HAVE_S32C1I		__XCHAL_HAVE_S32C1I
> +
> +#undef XCHAL_HAVE_EXCLUSIVE
> +#define XCHAL_HAVE_EXCLUSIVE		__XCHAL_HAVE_EXCLUSIVE

See notes above on this header.

> diff --git a/arch/xtensa/reloc.h b/arch/xtensa/reloc.h
> new file mode 100644
> index 00000000..03482f18
> --- /dev/null
> +++ b/arch/xtensa/reloc.h
> @@ -0,0 +1,38 @@
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +#define ENDIAN_SUFFIX "eb"
> +#else
> +#define ENDIAN_SUFFIX ""
> +#endif
> +
> +#if __FDPIC__
> +#define ABI_SUFFIX "-fdpic"
> +#else
> +#define ABI_SUFFIX ""
> +#endif
> +
> +#define LDSO_ARCH "xtensa" ENDIAN_SUFFIX ABI_SUFFIX

Any actual incompatible ABI (as opposed to ISA level) needs its own
LDSO name. So if there are alternate (call0/windowed?) ABIs you want
to support, different names need to be defined for them here and in
configure. Unless there's a good reason to support more than one ABI,
it probably just makes sense to pick the preferred one and have
configure error-out if the toolchain's ABI is incompatible.

> diff --git a/crt/xtensa/crti.S b/crt/xtensa/crti.S
> new file mode 100644
> index 00000000..82a837e7
> --- /dev/null
> +++ b/crt/xtensa/crti.S
> @@ -0,0 +1,29 @@
> +.section .init
> +.global  _init
> +.type    _init, @function
> +_init:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	addi	sp, sp, -16
> +	s32i	a0, sp, 0
> +#ifdef __FDPIC__
> +	s32i	a12, sp, 4
> +	mov	a12, a11
> +#endif
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

Same thing regarding ABIs.

> diff --git a/src/internal/xtensa/syscall.S b/src/internal/xtensa/syscall.S
> new file mode 100644
> index 00000000..f09a46e8
> --- /dev/null
> +++ b/src/internal/xtensa/syscall.S
> @@ -0,0 +1,25 @@
> +.global __syscall
> +.hidden __syscall
> +.type   __syscall,@function
> +.align 4
> +__syscall:
> +#ifdef __XTENSA_WINDOWED_ABI__
> +	entry	a1, 16
> +#endif
> +	mov	a8, a3
> +	mov	a3, a4
> +	mov	a4, a5
> +	mov	a5, a6
> +	mov	a6, a8
> +	mov	a8, a7
> +#if defined(__XTENSA_WINDOWED_ABI__)
> +	l32i	a9, a1, 16
> +	syscall
> +	retw
> +#elif defined(__XTENSA_CALL0_ABI__)
> +	l32i	a9, a1, 0
> +	syscall
> +	ret
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

And here too.

> diff --git a/src/ldso/xtensa/tlsdesc.S b/src/ldso/xtensa/tlsdesc.S
> new file mode 100644
> index 00000000..084d5f3e
> --- /dev/null
> +++ b/src/ldso/xtensa/tlsdesc.S
> @@ -0,0 +1,36 @@
> +.global __tlsdesc_static
> +.hidden __tlsdesc_static
> +.type __tlsdesc_static,@function
> +.align 4
> +__tlsdesc_static:
> +#ifdef __XTENSA_WINDOWED_ABI__
> +	entry	a1, 16
> +#endif
> +	rur	a3, threadptr
> +	add	a2, a2, a3
> +#if defined(__XTENSA_WINDOWED_ABI__)
> +	retw
> +#elif defined(__XTENSA_CALL0_ABI__)
> +	ret
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

Are you sure the entry step makes sense with tlsdesc? Is this honoring
the ABI for what the tlsdesc function is allowed to clobber?

> +.hidden __tls_get_new
> +
> +.global __tlsdesc_dynamic
> +.hidden __tlsdesc_dynamic
> +.type __tlsdesc_dynamic,@function
> +.align 4
> +__tlsdesc_dynamic:
> +#if defined(__XTENSA_WINDOWED_ABI__)
> +	entry	a1, 16
> +	mov	a6, a2
> +	call4	__tls_get_addr
> +	mov	a2, a6
> +	retw
> +#elif defined(__XTENSA_CALL0_ABI__)
> +	j	__tls_get_addr
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

This approach is no longer a valid implementation for
__tlsdesc_dynamic, because it calls C code which could clobber any
call-clobbered registers, not just the ones the tlsdesc function is
allowed to clobber.

Instead, it needs to actually do the dtv lookup in asm. musl does not
use any dynamic allocation here, so the lookup is guaranteed to just
work using the indices found with no conditional branches. See the
recently added riscv code for a clean example, in commit
407aea628af8c81d9e3f5a068568f2217db71bba.

> diff --git a/src/process/xtensa/vfork.S b/src/process/xtensa/vfork.S
> new file mode 100644
> index 00000000..25478b87
> --- /dev/null
> +++ b/src/process/xtensa/vfork.S
> @@ -0,0 +1,21 @@
> +.global vfork
> +.type vfork,@function
> +vfork:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	movi	a2, 116 # __NR_clone
> +	movi	a3, 0
> +	movi	a6, 0x4111 # CLONE_VM | CLONE_VFORK | SIGCHLD
> +	syscall
> +
> +#ifdef __FDPIC__
> +.hidden __syscall_ret
> +	movi	a9, __syscall_ret@GOT
> +	add	a9, a9, a11
> +	l32i	a9, a9, 0
> +	jx	a9
> +#else
> +#error Unsupported Xtensa ABI
> +#endif
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

Same re: ABIs. But maybe there's intent to support a non-FDPIC ABI
too?

> diff --git a/src/setjmp/xtensa/longjmp.S b/src/setjmp/xtensa/longjmp.S
> new file mode 100644
> index 00000000..602093c9
> --- /dev/null
> +++ b/src/setjmp/xtensa/longjmp.S
> @@ -0,0 +1,22 @@
> +.global _longjmp
> +.global longjmp
> +.type _longjmp,@function
> +.type longjmp,@function
> +.align 4
> +_longjmp:
> +longjmp:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	l32i	a0, a2, 0
> +	l32i	a1, a2, 4
> +	l32i	a12, a2, 8
> +	l32i	a13, a2, 12
> +	l32i	a14, a2, 16
> +	l32i	a15, a2, 20
> +
> +	movi	a2, 1
> +	movnez	a2, a3, a3
> +
> +	ret
> +#else
> +#error Unsupported Xtensa ABI
> +#endif
> diff --git a/src/setjmp/xtensa/setjmp.S b/src/setjmp/xtensa/setjmp.S
> new file mode 100644
> index 00000000..3a735a6a
> --- /dev/null
> +++ b/src/setjmp/xtensa/setjmp.S
> @@ -0,0 +1,25 @@
> +.global ___setjmp
> +.hidden ___setjmp
> +.global __setjmp
> +.global _setjmp
> +.global setjmp
> +.type __setjmp,@function
> +.type _setjmp,@function
> +.type setjmp,@function
> +.align 4
> +___setjmp:
> +__setjmp:
> +_setjmp:
> +setjmp:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	s32i	a0, a2, 0
> +	s32i	a1, a2, 4
> +	s32i	a12, a2, 8
> +	s32i	a13, a2, 12
> +	s32i	a14, a2, 16
> +	s32i	a15, a2, 20
> +	movi	a2, 0
> +	ret
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

Ditto.

> diff --git a/src/signal/xtensa/sigsetjmp.S b/src/signal/xtensa/sigsetjmp.S
> new file mode 100644
> index 00000000..e44bba88
> --- /dev/null
> +++ b/src/signal/xtensa/sigsetjmp.S
> @@ -0,0 +1,24 @@
> +.global sigsetjmp
> +.global __sigsetjmp
> +.type sigsetjmp,@function
> +.type __sigsetjmp,@function
> +.align 4
> +sigsetjmp:
> +__sigsetjmp:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	bnez	a3, 1f
> +.hidden ___setjmp
> +	j	___setjmp
> +1:
> +	mov	a3, a0
> +	mov	a4, a2
> +	call0	___setjmp
> +	mov	a0, a3
> +	mov	a3, a2
> +	mov	a2, a4
> +
> +.hidden __sigsetjmp_tail
> +	j	__sigsetjmp_tail
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

I don't think this code works. It does not seem to save the return
address and argument address before calling ___setjmp, so they will be
clobbered after it returns.

> diff --git a/src/thread/xtensa/__set_thread_area.S b/src/thread/xtensa/__set_thread_area.S
> new file mode 100644
> index 00000000..1f402125
> --- /dev/null
> +++ b/src/thread/xtensa/__set_thread_area.S
> @@ -0,0 +1,16 @@
> +.global __set_thread_area
> +.type   __set_thread_area,@function
> +.align 4
> +__set_thread_area:
> +#ifdef __XTENSA_WINDOWED_ABI__
> +	entry	a1, 16
> +#endif
> +	wur	a2, threadptr
> +	movi	a2, 0
> +#if defined(__XTENSA_WINDOWED_ABI__)
> +	retw
> +#elif defined(__XTENSA_CALL0_ABI__)
> +	ret
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

FWIW, few archs do this yet, but this function can be written in C
with inline asm to set the register, and that would avoid any ABI
conditionals if they're ever needed.

Eventually I'd like to move most functions like this, that are
currently external asm but have no good reason to be (unlike setjmp,
vfork, etc. which *do* have a good reason to be), to minimal inline
asm in C. Some archs' versions of this file would even just become
__syscall, no further asm.

> diff --git a/src/thread/xtensa/__unmapself.S b/src/thread/xtensa/__unmapself.S
> new file mode 100644
> index 00000000..15bf2bdf
> --- /dev/null
> +++ b/src/thread/xtensa/__unmapself.S
> @@ -0,0 +1,13 @@
> +.global __unmapself
> +.type   __unmapself,@function
> +.align 4
> +__unmapself:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	mov	a6, a2
> +	movi	a2, 81 # SYS_munmap
> +	syscall
> +	movi	a2, 118 # SYS_exit
> +	syscall
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

This is unsafe on fdpic/nommu if the kernel requires the task to
always have a valid stack. You might want to check how it's done on SH
and whether something similar is needed here. The idea is that it
temporarily switches to an alternate stack protected by the global
thread list lock (so only one thread can be using it at a time) in
order to make it safe to unmap the thread's own stack. The thread list
is then unlocked by the kernel when the thread dies (via the exit
futex).

> diff --git a/src/thread/xtensa/clone.S b/src/thread/xtensa/clone.S
> new file mode 100644
> index 00000000..8e8514d6
> --- /dev/null
> +++ b/src/thread/xtensa/clone.S
> @@ -0,0 +1,46 @@
> +# __clone(func, stack, flags, arg, ptid, tls, ctid)
> +#         a2,   a3,    a4,    a5,  a6,   a7,  [sp]
> +
> +# syscall(SYS_clone, flags, stack, ptid, tls, ctid)
> +#         a2,        a6,    a3,    a4,   a5,  a8
> +
> +.global __clone
> +.type   __clone,@function
> +.align 4
> +__clone:
> +#if defined(__XTENSA_CALL0_ABI__)
> +	# align stack and save func,arg
> +	srli	a3, a3, 4
> +	slli	a3, a3, 4
> +	addi	a3, a3, -16
> +	s32i	a2, a3, 0
> +	s32i	a5, a3, 4
> +
> +	# syscall
> +	mov	a2, a4
> +	mov	a4, a6
> +	mov	a6, a2
> +	mov	a5, a7
> +	l32i	a8, a1, 16
> +	movi	a2, 116 # SYS_clone
> +	syscall
> +
> +	beqz	a2, 1f
> +	# parent
> +	ret
> +
> +	# child
> +1:
> +	l32i	a0, a1, 0
> +	l32i	a2, a1, 4
> +#ifdef __FDPIC__
> +	l32i	a11, a0, 4
> +	l32i	a0, a0, 0
> +#endif
> +	callx0	a0
> +	mov	a6, a2
> +	movi	a2, 118 # SYS_exit
> +	syscall
> +#else
> +#error Unsupported Xtensa ABI
> +#endif

Same re: ABIs.

> diff --git a/src/thread/xtensa/syscall_cp.S b/src/thread/xtensa/syscall_cp.S
> new file mode 100644
> index 00000000..de7c3e66
> --- /dev/null
> +++ b/src/thread/xtensa/syscall_cp.S
> @@ -0,0 +1,38 @@
> +# __syscall_cp_asm(&self->cancel, nr, u, v, w, x, y, z)
> +#                  a2             a3  a4 a5 a6 a7 [sp] [sp+4]
> +
> +# syscall(nr, u, v, w, x, y, z)
> +#         a2  a6 a3 a4 a5 a8 a9
> +
> +.global __cp_begin
> +.hidden __cp_begin
> +.global __cp_end
> +.hidden __cp_end
> +.global __cp_cancel
> +.hidden __cp_cancel
> +.hidden __cancel
> +.global __syscall_cp_asm
> +.hidden __syscall_cp_asm
> +.type __syscall_cp_asm,@function
> +.align 4
> +#if defined(__XTENSA_CALL0_ABI__)
> +__syscall_cp_asm:
> +__cp_begin:
> +	l32i	a2, a2, 0
> +	bnez	a2, __cp_cancel
> +	mov	a2, a4
> +	mov	a4, a6
> +	mov	a6, a2
> +	mov	a2, a3
> +	mov	a3, a5
> +	mov	a5, a7
> +	l32i	a8, a1, 0
> +	l32i	a9, a1, 4
> +	syscall
> +__cp_end:
> +	ret
> +__cp_cancel:
> +	j	__cancel
> +#else
> +#error Unsupported Xtensa ABI
> +#endif
> -- 
> 2.21.0

And this one.


> >From f8c5953ecdb948282cc8e573b729c25db60a95a8 Mon Sep 17 00:00:00 2001
> From: Max Filippov <jcmvbkbc@...il.com>
> Date: Wed, 21 Feb 2024 08:27:37 -0800
> Subject: [PATCH 2/2] WIP
> 
> Signed-off-by: Max Filippov <jcmvbkbc@...il.com>
> ---
>  ldso/dlstart.c         | 7 +++++++
>  ldso/dynlink.c         | 6 ++++--
>  src/internal/dynlink.h | 5 +++--
>  3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/ldso/dlstart.c b/ldso/dlstart.c
> index 259f5e18..beca953f 100644
> --- a/ldso/dlstart.c
> +++ b/ldso/dlstart.c
> @@ -90,12 +90,19 @@ hidden void _dlstart_c(size_t *sp, size_t *dynv)
>  				- segs[rel_addr[1]].p_vaddr
>  				+ syms[R_SYM(rel[1])].st_value;
>  			rel_addr[1] = dyn[DT_PLTGOT];
> +		} else if (R_TYPE(rel[1]) == REL_RELATIVE) {
> +			size_t val = *rel_addr;
> +			for (j=0; val-segs[j].p_vaddr >= segs[j].p_memsz; j++);
> +			*rel_addr += segs[j].addr - segs[j].p_vaddr;

So xtensa has a "relative" reloc type that's just adjusted by the load
offset of the segment the relocation lives in, rather than needing to
use a symbolic relocation referencing a section symbol like other
fdpic archs do?

If so, this looks right and looks ok.

>  		} else {
>  			size_t val = syms[R_SYM(rel[1])].st_value;
>  			for (j=0; val-segs[j].p_vaddr >= segs[j].p_memsz; j++);
>  			*rel_addr = rel[2] + segs[j].addr - segs[j].p_vaddr + val;
>  		}
>  	}
> +#ifdef __xtensa__
> +	((unsigned long *)dyn[DT_PLTGOT])[3] = segs[0].addr - segs[0].p_vaddr;
> +#endif

Is this actually needed for anything? Generally musl doesn't use the
reserved GOT slots itself, and on all the other archs I'm aware of,
they're essentially reserved to the dynamic linker implementation so
the dynamic linker is just free not to use them and not to set them
up.

>  #else
>  	/* If the dynamic linker is invoked as a command, its load
>  	 * address is not available in the aux vector. Instead, compute
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index ceca3c98..25563af3 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -1420,6 +1420,7 @@ static void reloc_all(struct dso *p)
>  		if (!DL_FDPIC)
>  			do_relr_relocs(p, laddr(p, dyn[DT_RELR]), dyn[DT_RELRSZ]);
>  
> +#if 0
>  		if (head != &ldso && p->relro_start != p->relro_end) {
>  			long ret = __syscall(SYS_mprotect, laddr(p, p->relro_start),
>  				p->relro_end-p->relro_start, PROT_READ);
> @@ -1429,6 +1430,7 @@ static void reloc_all(struct dso *p)
>  				if (runtime) longjmp(*rtld_fail, 1);
>  			}
>  		}
> +#endif

Was this breaking something? Relro linking probably should have been
disabled on fdpic, and we ignore ENOSYS anyway for nommu.

>  		p->relocated = 1;
>  	}
> @@ -1485,7 +1487,7 @@ void __libc_exit_fini()
>  		if (dyn[0] & (1<<DT_FINI_ARRAY)) {
>  			size_t n = dyn[DT_FINI_ARRAYSZ]/sizeof(size_t);
>  			size_t *fn = (size_t *)laddr(p, dyn[DT_FINI_ARRAY])+n;
> -			while (n--) ((void (*)(void))*--fn)();
> +			while (n--) fpaddr(p, *--fn)();

If this is fixable on the tooling side it really should be fixed
there. init/fini arrays should have actual language-level function
addresses (descriptor addresses on fdpic), not instruction addresses.

If it's not fixable at this point, I guess we need the arch's reloc.h
to define a macro signaling this difference in behavior.

I have no idea how this would be expected to work on static linked
programs...

> diff --git a/src/internal/dynlink.h b/src/internal/dynlink.h
> index 06f41d09..6485e883 100644
> --- a/src/internal/dynlink.h
> +++ b/src/internal/dynlink.h
> @@ -78,10 +78,11 @@ struct fdpic_dummy_loadmap {
>  	(R_TYPE(x) == REL_RELATIVE) || \
>  	(R_TYPE(x) == REL_SYM_OR_REL && !R_SYM(x)) )
>  #else
> -#define IS_RELATIVE(x,s) ( ( \
> +#define IS_RELATIVE(x,s) ( \
> +	(R_TYPE(x) == REL_RELATIVE) || ( ( \
>  	(R_TYPE(x) == REL_FUNCDESC_VAL) || \
>  	(R_TYPE(x) == REL_SYMBOLIC) ) \
> -	&& (((s)[R_SYM(x)].st_info & 0xf) == STT_SECTION) )
> +	&& (((s)[R_SYM(x)].st_info & 0xf) == STT_SECTION) ) )
>  #endif
>  
>  #ifndef NEED_MIPS_GOT_RELOCS
> -- 
> 2.21.0
> 

This looks ok assuming the above about relative relocs.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.