musl - Re: [PATCH] RISC-V: add riscv64 architecture support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190526043144.GG23599@brightrain.aerifal.cx>
Date: Sun, 26 May 2019 00:31:44 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] RISC-V: add riscv64 architecture support

On Fri, May 24, 2019 at 10:46:08AM -0400, Drew DeVault wrote:
> Author: Alex Suykov <alex.suykov@...il.com>
> Author: Aric Belsito <lluixhi@...il.com>
> Author: Drew DeVault <sir@...wn.com>
> Author: Michael Clark <mjc@...ive.com>
> Author: Michael Forney <mforney@...rney.org>
> Author: Stefan O'Rear <sorear2@...il.com>
> ---
> This is the latest version of our work on porting musl to riscv64. We've
> removed riscv32 from this port for the time being, as the 32-bit ABI has
> not stabilized yet. I've ported Alpine Linux to riscv64 based on this
> work and have been using it for several months without issue.

Excellent!

> Stefan had a concern with barriers for setjmp, but I haven't encountered
> any issues. Will let him to comment on this if the concern remains.

I don't see how barriers have anything to do with setjmp. I think it
was a separate issue but it was never reported what the issue was.

>  arch/riscv64/atomic_arch.h             |  66 ++++++
>  arch/riscv64/bits/alltypes.h.in        |  29 +++
>  arch/riscv64/bits/endian.h             |   5 +
>  arch/riscv64/bits/fcntl.h              |  38 ++++
>  arch/riscv64/bits/fenv.h               |  17 ++
>  arch/riscv64/bits/float.h              |  16 ++
>  arch/riscv64/bits/ipc.h                |  14 ++
>  arch/riscv64/bits/limits.h             |   7 +
>  arch/riscv64/bits/mman.h               |   1 +
>  arch/riscv64/bits/msg.h                |  13 ++
>  arch/riscv64/bits/posix.h              |   2 +
>  arch/riscv64/bits/reg.h                |   8 +
>  arch/riscv64/bits/sem.h                |   9 +
>  arch/riscv64/bits/setjmp.h             |   1 +
>  arch/riscv64/bits/shm.h                |  26 +++
>  arch/riscv64/bits/signal.h             | 113 ++++++++++
>  arch/riscv64/bits/socket.h             |  33 +++
>  arch/riscv64/bits/stat.h               |  18 ++
>  arch/riscv64/bits/stdint.h             |  20 ++
>  arch/riscv64/bits/syscall.h.in         | 278 +++++++++++++++++++++++++
>  arch/riscv64/bits/user.h               |  43 ++++
>  arch/riscv64/crt_arch.h                |  18 ++
>  arch/riscv64/pthread_arch.h            |  12 ++
>  arch/riscv64/reloc.h                   |  27 +++
>  arch/riscv64/syscall_arch.h            |  79 +++++++
>  configure                              |   6 +
>  crt/riscv64/crti.s                     |  11 +
>  crt/riscv64/crtn.s                     |   0
>  include/elf.h                          |  56 +++++
>  src/fenv/riscv64/fenv-sf.c             |   3 +
>  src/fenv/riscv64/fenv.S                |  53 +++++
>  src/internal/riscv64/syscall.s         |  15 ++
>  src/ldso/riscv64/dlsym.s               |   6 +
>  src/linux/cache.c                      |  37 ++++
>  src/math/riscv64/copysign.c            |  15 ++
>  src/math/riscv64/copysignf.c           |  15 ++
>  src/math/riscv64/fabs.c                |  15 ++
>  src/math/riscv64/fabsf.c               |  15 ++
>  src/math/riscv64/fma.c                 |  15 ++
>  src/math/riscv64/fmaf.c                |  15 ++
>  src/math/riscv64/fmax.c                |  15 ++
>  src/math/riscv64/fmaxf.c               |  15 ++
>  src/math/riscv64/fmin.c                |  15 ++
>  src/math/riscv64/fminf.c               |  15 ++
>  src/math/riscv64/sqrt.c                |  15 ++
>  src/math/riscv64/sqrtf.c               |  15 ++
>  src/setjmp/riscv64/longjmp.S           |  42 ++++
>  src/setjmp/riscv64/setjmp.S            |  41 ++++
>  src/signal/riscv64/restore.s           |   8 +
>  src/signal/riscv64/sigsetjmp.s         |  23 ++
>  src/thread/riscv64/__set_thread_area.s |   6 +
>  src/thread/riscv64/__unmapself.s       |   7 +
>  src/thread/riscv64/clone.s             |  34 +++
>  src/thread/riscv64/syscall_cp.s        |  29 +++
>  54 files changed, 1450 insertions(+)
>  create mode 100644 arch/riscv64/atomic_arch.h
>  create mode 100644 arch/riscv64/bits/alltypes.h.in
>  create mode 100644 arch/riscv64/bits/endian.h
>  create mode 100644 arch/riscv64/bits/fcntl.h
>  create mode 100644 arch/riscv64/bits/fenv.h
>  create mode 100644 arch/riscv64/bits/float.h
>  create mode 100644 arch/riscv64/bits/ipc.h
>  create mode 100644 arch/riscv64/bits/limits.h
>  create mode 100644 arch/riscv64/bits/mman.h
>  create mode 100644 arch/riscv64/bits/msg.h
>  create mode 100644 arch/riscv64/bits/posix.h
>  create mode 100644 arch/riscv64/bits/reg.h
>  create mode 100644 arch/riscv64/bits/sem.h
>  create mode 100644 arch/riscv64/bits/setjmp.h
>  create mode 100644 arch/riscv64/bits/shm.h
>  create mode 100644 arch/riscv64/bits/signal.h
>  create mode 100644 arch/riscv64/bits/socket.h
>  create mode 100644 arch/riscv64/bits/stat.h
>  create mode 100644 arch/riscv64/bits/stdint.h
>  create mode 100644 arch/riscv64/bits/syscall.h.in
>  create mode 100644 arch/riscv64/bits/user.h
>  create mode 100644 arch/riscv64/crt_arch.h
>  create mode 100644 arch/riscv64/pthread_arch.h
>  create mode 100644 arch/riscv64/reloc.h
>  create mode 100644 arch/riscv64/syscall_arch.h
>  create mode 100644 crt/riscv64/crti.s
>  create mode 100644 crt/riscv64/crtn.s
>  create mode 100644 src/fenv/riscv64/fenv-sf.c
>  create mode 100644 src/fenv/riscv64/fenv.S
>  create mode 100644 src/internal/riscv64/syscall.s
>  create mode 100644 src/ldso/riscv64/dlsym.s
>  create mode 100644 src/math/riscv64/copysign.c
>  create mode 100644 src/math/riscv64/copysignf.c
>  create mode 100644 src/math/riscv64/fabs.c
>  create mode 100644 src/math/riscv64/fabsf.c
>  create mode 100644 src/math/riscv64/fma.c
>  create mode 100644 src/math/riscv64/fmaf.c
>  create mode 100644 src/math/riscv64/fmax.c
>  create mode 100644 src/math/riscv64/fmaxf.c
>  create mode 100644 src/math/riscv64/fmin.c
>  create mode 100644 src/math/riscv64/fminf.c
>  create mode 100644 src/math/riscv64/sqrt.c
>  create mode 100644 src/math/riscv64/sqrtf.c
>  create mode 100644 src/setjmp/riscv64/longjmp.S
>  create mode 100644 src/setjmp/riscv64/setjmp.S
>  create mode 100644 src/signal/riscv64/restore.s
>  create mode 100644 src/signal/riscv64/sigsetjmp.s
>  create mode 100644 src/thread/riscv64/__set_thread_area.s
>  create mode 100644 src/thread/riscv64/__unmapself.s
>  create mode 100644 src/thread/riscv64/clone.s
>  create mode 100644 src/thread/riscv64/syscall_cp.s
> 
> diff --git a/arch/riscv64/atomic_arch.h b/arch/riscv64/atomic_arch.h
> new file mode 100644
> index 00000000..018c7fd2
> --- /dev/null
> +++ b/arch/riscv64/atomic_arch.h
> @@ -0,0 +1,66 @@
> +#define a_barrier a_barrier
> +static inline void a_barrier()
> +{
> +	__asm__ __volatile__ ("fence rw,rw" : : : "memory");
> +}
> +
> +#define a_ll a_ll
> +static inline int a_ll(volatile int *p)
> +{
> +	int v;
> +	__asm__ __volatile__ ("lr.w %0, %1" : "=&r"(v), "+A"(*p));
> +	return v;
> +}
> +
> +#define a_sc a_sc
> +static inline int a_sc(volatile int *p, int v)
> +{
> +	int r;
> +	__asm__ __volatile__ ("sc.w %0, %2, %1" : "=&r"(r), "+A"(*p) : "r"(v) : "memory");
> +return !r;
> +}
> +
> +#define a_cas a_cas
> +static inline int a_cas(volatile int *p, int t, int s)
> +{
> +	int old, tmp;
> +	__asm__("1:  lr.w    %0, %2      \n"
> +		"    bne     %0, %3, 1f  \n"
> +		"    sc.w    %1, %4, %2  \n"
> +		"    bnez    %1, 1b      \n"
> +		"1:                      \n"
> +		: "=&r"(old), "+r"(tmp), "+A"(*p)
> +		: "r"(t), "r"(s));
> +	return old;
> +}

I don't understand why a_cas is defined here, and it's almost surely
wrong since it has no barriers whatsoever. If a_cas is left undefined,
it would be auto-defined in terms of a_ll and a_sc. But if we want to
follow the ISA docs which place restrictions on the positioning of
ll/sc and what can be done between them, I think we can't define a_ll
and a_sc, but have to define a_cas (with proper barriers) and let
everything else get defined in terms of it...

> diff --git a/arch/riscv64/bits/endian.h b/arch/riscv64/bits/endian.h
> new file mode 100644
> index 00000000..7df0e02a
> --- /dev/null
> +++ b/arch/riscv64/bits/endian.h
> @@ -0,0 +1,5 @@
> +#if __RISCVEB__
> +#define __BYTE_ORDER __BIG_ENDIAN
> +#else
> +#define __BYTE_ORDER __LITTLE_ENDIAN
> +#endif

This is probably gratuitous since there's no big endian ABI defined.
Or if the intent is to have a big endian ABI, that needs to be
reflected in reloc.h.

> diff --git a/arch/riscv64/bits/sem.h b/arch/riscv64/bits/sem.h
> new file mode 100644
> index 00000000..5f93c12d
> --- /dev/null
> +++ b/arch/riscv64/bits/sem.h
> @@ -0,0 +1,9 @@
> +struct semid_ds {
> +	struct ipc_perm sem_perm;
> +	time_t sem_otime;
> +	time_t sem_ctime;
> +	unsigned short sem_nsems;
> +	char __sem_nsems_pad[sizeof(time_t)-sizeof(short)];
> +	time_t __unused3;
> +	time_t __unused4;
> +};

I suspect the above would also be wrong if there's a big endian
variant.

> diff --git a/arch/riscv64/bits/signal.h b/arch/riscv64/bits/signal.h
> new file mode 100644
> index 00000000..8b992cc8
> --- /dev/null
> +++ b/arch/riscv64/bits/signal.h
> @@ -0,0 +1,113 @@
> +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +
> +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +# define MINSIGSTKSZ 2048
> +# define SIGSTKSZ 8192
> +#endif

Is 2k min correct here? It's ABI so we should aim to get it right to
begin with.

> +typedef struct __ucontext
> +{
> +	unsigned long uc_flags;
> +	struct __ucontext *uc_link;
> +	stack_t uc_stack;
> +	sigset_t uc_sigmask;
> +	char __unused[1024 / 8 - sizeof(sigset_t)];
> +	mcontext_t uc_mcontext;
> +} ucontext_t;

I think I mentioned once before in review: the __unused padding is
declared with size 0 and is thereby invalid C. It should just be
removed.

> diff --git a/arch/riscv64/bits/socket.h b/arch/riscv64/bits/socket.h
> new file mode 100644
> index 00000000..c11677e9
> --- /dev/null
> +++ b/arch/riscv64/bits/socket.h
> @@ -0,0 +1,33 @@
> +#include <endian.h>
> +
> +struct msghdr {
> +	void *msg_name;
> +	socklen_t msg_namelen;
> +	struct iovec *msg_iov;
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +	int __pad1, msg_iovlen;
> +#else
> +	int msg_iovlen, __pad1;
> +#endif
> +	void *msg_control;
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +	int __pad2;
> +	socklen_t msg_controllen;
> +#else
> +	socklen_t msg_controllen;
> +	int __pad2;
> +#endif
> +	int msg_flags;
> +};
> +
> +struct cmsghdr {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +	int __pad1;
> +	socklen_t cmsg_len;
> +#else
> +	socklen_t cmsg_len;
> +	int __pad1;
> +#endif
> +	int cmsg_level;
> +	int cmsg_type;
> +};

Again, are both endians actually supported?

> diff --git a/arch/riscv64/crt_arch.h b/arch/riscv64/crt_arch.h
> new file mode 100644
> index 00000000..d0e32a6c
> --- /dev/null
> +++ b/arch/riscv64/crt_arch.h
> @@ -0,0 +1,18 @@
> +__asm__(
> +".text\n"
> +".global " START "\n"
> +".type " START ",%function\n"
> +START ":\n"
> +".weak __global_pointer$\n"
> +".hidden __global_pointer$\n"
> +".option push\n"
> +".option norelax\n\t"
> +"lla gp, __global_pointer$\n"
> +".option pop\n\t"
> +"mv a0, sp\n"
> +".weak _DYNAMIC\n"
> +".hidden _DYNAMIC\n\t"
> +"lla a1, _DYNAMIC\n\t"
> +"andi sp, sp, -16\n\t"
> +"jal " START "_c"
> +);

Is __global_pointer actually used in the hosted ABI?

> diff --git a/arch/riscv64/reloc.h b/arch/riscv64/reloc.h
> new file mode 100644
> index 00000000..8bd90dda
> --- /dev/null
> +++ b/arch/riscv64/reloc.h
> @@ -0,0 +1,27 @@
> +#if defined __riscv_float_abi_soft
> +#define RISCV_FP_SUFFIX "-sf"
> +#elif defined __riscv_float_abi_single
> +#define RISCV_FP_SUFFIX "-sp"
> +#elif defined __riscv_float_abi_double
> +#define RISCV_FP_SUFFIX ""
> +#endif
> +
> +#define RISCV_LDSO_HELPER(x) "riscv" #x
> +#define RISCV_LDSO(x) RISCV_LDSO_HELPER(x)
> +
> +#define LDSO_ARCH RISCV_LDSO(__riscv_xlen) RISCV_FP_SUFFIX

Here's where a big endian variant would be needed if that's to be
supported.

> +#define NO_LEGACY_INITFINI

This should not be defined. It was a hack to fix a historic mistake we
made on arm a long time ago (arm crti/n _init and _fini once contained
asm to call the init/fini arrays, and so if we executed both the
init/fini and the arrays, the arrays would get executed twice).

> +
> +#define TPOFF_K 0
> +
> +#define REL_SYMBOLIC    R_RISCV_64
> +#define REL_PLT         R_RISCV_JUMP_SLOT
> +#define REL_RELATIVE    R_RISCV_RELATIVE
> +#define REL_COPY        R_RISCV_COPY
> +#define REL_DTPMOD      R_RISCV_TLS_DTPMOD64
> +#define REL_DTPOFF      R_RISCV_TLS_DTPREL64
> +#define REL_TPOFF       R_RISCV_TLS_TPREL64
> +
> +#define CRTJMP(pc,sp) __asm__ __volatile__( \
> +	"mv sp, %1 ; jr %0" : : "r"(pc), "r"(sp) : "memory" )
> diff --git a/arch/riscv64/syscall_arch.h b/arch/riscv64/syscall_arch.h
> new file mode 100644
> index 00000000..a5b0f646
> --- /dev/null
> +++ b/arch/riscv64/syscall_arch.h
> @@ -0,0 +1,79 @@
> +#define __SYSCALL_LL_E(x) (x)
> +#define __SYSCALL_LL_O(x) (x)
> +
> +#define __asm_syscall(...) \
> +	__asm__ __volatile__ ("ecall\n\t" \
> +	: "+r"(a0) : __VA_ARGS__ : "memory"); \
> +	return a0; \
> +
> +static inline long __syscall0(long n)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0");
> +	__asm_syscall("r"(a7))
> +}
> +
> +static inline long __syscall1(long n, long a)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	__asm_syscall("r"(a7), "0"(a0))
> +}
> +
> +static inline long __syscall2(long n, long a, long b)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	register long a1 __asm__("a1") = b;
> +	__asm_syscall("r"(a7), "0"(a0), "r"(a1))
> +}
> +
> +static inline long __syscall3(long n, long a, long b, long c)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	register long a1 __asm__("a1") = b;
> +	register long a2 __asm__("a2") = c;
> +	__asm_syscall("r"(a7), "0"(a0), "r"(a1), "r"(a2))
> +}
> +
> +static inline long __syscall4(long n, long a, long b, long c, long d)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	register long a1 __asm__("a1") = b;
> +	register long a2 __asm__("a2") = c;
> +	register long a3 __asm__("a3") = d;
> +	__asm_syscall("r"(a7), "0"(a0), "r"(a1), "r"(a2), "r"(a3))
> +}
> +
> +static inline long __syscall5(long n, long a, long b, long c, long d, long e)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	register long a1 __asm__("a1") = b;
> +	register long a2 __asm__("a2") = c;
> +	register long a3 __asm__("a3") = d;
> +	register long a4 __asm__("a4") = e;
> +	__asm_syscall("r"(a7), "0"(a0), "r"(a1), "r"(a2), "r"(a3), "r"(a4))
> +}
> +
> +static inline long __syscall6(long n, long a, long b, long c, long d, long e, long f)
> +{
> +	register long a7 __asm__("a7") = n;
> +	register long a0 __asm__("a0") = a;
> +	register long a1 __asm__("a1") = b;
> +	register long a2 __asm__("a2") = c;
> +	register long a3 __asm__("a3") = d;
> +	register long a4 __asm__("a4") = e;
> +	register long a5 __asm__("a5") = f;
> +	__asm_syscall("r"(a7), "0"(a0), "r"(a1), "r"(a2), "r"(a3), "r"(a4), "r"(a5))
> +}
> +
> +#define VDSO_USEFUL
> +/* We don't have a clock_gettime function.
> +#define VDSO_CGT_SYM "__vdso_clock_gettime"
> +#define VDSO_CGT_VER "LINUX_2.6" */
> +
> +#define VDSO_FLUSH_ICACHE_SYM "__vdso_flush_icache"
> +#define VDSO_FLUSH_ICACHE_VER "LINUX_4.5"

I don't see why this needs to be defined here; the point in having CGT
and other functions here is to tell generic, arch-independent code
whether the arch potentially has a usable vdso version. But for
flush_icache this is rv-specific already, so the name/version should
just be there.

> diff --git a/configure b/configure
> index 2123ddce..855eab58 100755
> --- a/configure
> +++ b/configure
> @@ -322,6 +322,7 @@ microblaze*) ARCH=microblaze ;;
>  or1k*) ARCH=or1k ;;
>  powerpc64*|ppc64*) ARCH=powerpc64 ;;
>  powerpc*|ppc*) ARCH=powerpc ;;
> +riscv64*) ARCH=riscv64 ;;
>  sh[1-9bel-]*|sh|superh*) ARCH=sh ;;
>  s390x*) ARCH=s390x ;;
>  unknown) fail "$0: unable to detect target arch; try $0 --target=..." ;;
> @@ -654,6 +655,11 @@ trycppif __LITTLE_ENDIAN__ "$t" && SUBARCH=${SUBARCH}le
>  trycppif _SOFT_FLOAT "$t" && fail "$0: error: soft-float not supported on powerpc64"
>  fi
>  
> +if test "$ARCH" = "riscv64" ; then
> +trycppif "RISCVEB || _RISCVEB || __RISCVEB || __RISCVEB__" "$t" && SUBARCH=${SUBARCH}eb

OK, this suggests big endian is intended to be supported, so reloc.h
needs to match.

> +trycppif __riscv_float_abi_soft "$t" && SUBARCH=${SUBARCH}-sf

Is -sp also supposed to be supported?

> diff --git a/src/internal/riscv64/syscall.s b/src/internal/riscv64/syscall.s
> new file mode 100644
> index 00000000..27905f41
> --- /dev/null
> +++ b/src/internal/riscv64/syscall.s
> @@ -0,0 +1,15 @@
> +.global __syscall
> +.hidden __syscall
> +.type   __syscall,%function
> +__syscall:
> +        mv t0, a0
> +        mv a0, a1
> +        mv a1, a2
> +        mv a2, a3
> +        mv a3, a4
> +        mv a4, a5
> +        mv a5, a6
> +        mv a6, a7
> +        mv a7, t0
> +        ecall
> +        ret

This file can be dropped. musl no longer uses syscall.s.

> diff --git a/src/linux/cache.c b/src/linux/cache.c
> index 84a138a4..0c5af2ec 100644
> --- a/src/linux/cache.c
> +++ b/src/linux/cache.c
> @@ -1,4 +1,6 @@
> +#include <errno.h>
>  #include "syscall.h"
> +#include "atomic.h"
>  
>  #ifdef SYS_cacheflush
>  int _flush_cache(void *addr, int len, int op)
> @@ -15,3 +17,38 @@ int __cachectl(void *addr, int len, int op)
>  }
>  weak_alias(__cachectl, cachectl);
>  #endif
> +
> +#ifdef SYS_riscv_flush_icache
> +
> +#ifdef VDSO_FLUSH_ICACHE_SYM

There's no reason for this to be conditional since it can't change; we
define it and thus it's always defined.

OK, that probably looks like a lot, but very little if anything from
it is blocking. I'm fine with taking patches or writing my own to fix
up things after the initial commit, just as long as we make sure the
initial commit doesn't have known ABI issues resolving whether big
endian is supposed to be supported, and whether single-precision fpu
is supposed to be supported, and what MINSIGSTKSZ should be, are the
main ones I see. I also think the atomics are probably broken enough
that it would be nice to get them fixed.

I'm working on trying to get musl-cross-make in shape so that I can
test a build. At least binutils needs to be updated to a version with
riscv support, and in updating it I need to make sure I get a version
without the x86 regression or add a patch for it.. Hopefully I can
start testing in the next day or two.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.