Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200917013236.GV3265@brightrain.aerifal.cx>
Date: Wed, 16 Sep 2020 21:32:36 -0400
From: 'Rich Felker' <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Hexagon DSP support

On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@...eaurora.org wrote:
> 
> 
> > -----Original Message-----
> > From: sidneym@...eaurora.org <sidneym@...eaurora.org>
> > Sent: Friday, July 24, 2020 12:50 PM
> > To: 'Szabolcs Nagy' <nsz@...t70.net>
> > Cc: 'Rich Felker' <dalias@...c.org>; 'musl@...ts.openwall.com'
> > <musl@...ts.openwall.com>
> > Subject: RE: [musl] Hexagon DSP support
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Szabolcs Nagy <nsz@...t70.net>
> > > Sent: Thursday, July 23, 2020 4:56 PM
> > > To: sidneym@...eaurora.org
> > > Cc: 'Rich Felker' <dalias@...c.org>; musl@...ts.openwall.com
> > > Subject: Re: [musl] Hexagon DSP support
> > >
> > > * sidneym@...eaurora.org <sidneym@...eaurora.org> [2020-07-20
> > > 16:26:58 -0500]:
> > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins,
> > > > https://reviews.llvm.org/D82263
> > > > The comparison with musl can be found here:
> > > > https://github.com/quic/musl/compare/hexagon but I've also attached
> > > > the patch.
> > > >
> > > > An assert in clang when building both musl and libc-test for hexagon
> > > > was fixed by, https://reviews.llvm.org/D80952 prior to this change
> > > > -frounding-math had to be used.
> > > >
> > > > The test-results are also attached.  Everything is built with the
> > > > tip-of-tree llvm so sometimes results vary but these are the results
> > > > I got from this morning's clone.  The only notable difference in the
> > > > results would be that both fma and fmal fail and this is because of
> > > > the compiler-rt change.  I didn't add fma to musl because it require
> > > > more complex assembly, along the lines you saw in an earlier version
> > > > with
> > > sqrt.
> > >
> > >
> > > the fma and sqrt failures are still not fully explained, e.g. this looks
> wrong:
> > >
> > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) want
> > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = -0x1p+52 +
> > > 0x0p+0
> > >
> > > the only target specific bit in fma is a_clz_64 so i would check that.
> > >
> > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation happens
> > > during the fma test case above).
> > 
> > Hexagon didn't have a_clz_64 implemented however I added this morning it
> > and noticed no differences.  I will update the patch with that routine
> > included.
> > 
> > I did notice a compiler regression in how it compiled fma and have asked a
> > compiler person to take a look.  An older version of our internally
> maintained
> > compiler does produce the expected results for the values I used but later
> > versions do not.  Unfortunately changing optimization levels will produce
> > different results as well.
> 
> I've attached updated test results and patch.  The patch doesn't change much
> other than adding the above mentioned a_clz_64.  The only other change was
> an update to pthread_arch.h for an api update so hexagon conforms with the
> rest of musl.
> 
> Between updates to llvm and musl both fma and sqrt issues are resolved
> provided I compile the library without optimization enabled.  No new tests
> fail.
> 
> I guess I also need to know what the thoughts are about adding hexagon to
> the mainline base.  There are no issues adding from this end.

I'll post some review with the hope that this can move forward
upstream in musl soon. I might need some help figuring out how to get
a cross build environment to check things, but I'll follow up when I
do.

The review that follows is not 100% thorough but I think it's more
detailed than I've done for hexagon so far. Most of it's open to
discussion if you think anything I say is wrong.

> diff --git a/arch/hexagon/atomic_arch.h b/arch/hexagon/atomic_arch.h
> new file mode 100644
> index 00000000..ede55956
> --- /dev/null
> +++ b/arch/hexagon/atomic_arch.h
> @@ -0,0 +1,194 @@
> +#define a_ctz_32 a_ctz_32
> +static inline int a_ctz_32(unsigned long x)
> +{
> +	__asm__(
> +		"%0 = ct0(%0)\n\t"
> +		: "+r"(x));
> +	return x;
> +}
> +
> +#define a_ctz_64 a_ctz_64
> +static inline int a_ctz_64(uint64_t x)
> +{
> +	int count;
> +	__asm__(
> +		"%0 = ct0(%1)\n\t"
> +		: "=r"(count) : "r"(x));
> +	return count;
> +}
> +#define a_clz_64 a_clz_64
> +static inline int a_clz_64(uint64_t x)
> +{
> +        int count;
> +        __asm__(
> +                "%1 = brev(%1)\n\t"
> +                "%0 = ct0(%1)\n\t"
> +                : "=r"(count) : "r"(x));
> +        return count;
> +}

This should probably do just the brev in asm then return
a_ctz_64(result) so that the compiler has the freedom to schedule the
insns independently, unless there's a reason not to want it to do
that.

> +#define a_cas a_cas
> +static inline int a_cas(volatile int *p, int t, int s)
> +{
> +	int dummy;
> +	__asm__ __volatile__(
> +		"1:	%0 = memw_locked(%1)\n\t"
> +		"	{ p0 = cmp.eq(%0, %2)\n\t"
> +		"	  if (!p0.new) jump:nt 2f }\n\t"
> +		"	memw_locked(%1, p0) = %3\n\t"
> +		"	if (!p0) jump 1b\n\t"
> +		"2:	\n\t"
> +		: "=&r"(dummy)
> +		: "r"(p), "r"(t), "r"(s)
> +		: "p0", "memory" );
> +        return dummy;
> +}

I don't know the hexagon atomic model, but as far as I can tell these
at least "look right" in the sense of having right asm constraints.

> [...]
> +#define a_barrier a_barrier
> +static inline void a_barrier()
> +{
> +	__asm__ __volatile__ ("barrier" ::: "memory");
> +}

Is the barrier implied in memw_locked? If not, there need to be
explicit barriers in all the atomic functions.

> diff --git a/arch/hexagon/bits/alltypes.h.in b/arch/hexagon/bits/alltypes.h.in
> new file mode 100644
> index 00000000..9d770c7e
> --- /dev/null
> +++ b/arch/hexagon/bits/alltypes.h.in
> @@ -0,0 +1,26 @@
> +#define _Addr int
> +#define _Int64 long long
> +#define _Reg int
> +
> +#define __BYTE_ORDER 1234
> +#define __LONG_MAX 0x7fffffffL
> +
> +#ifndef __cplusplus
> +#ifdef __WCHAR_TYPE__
> +TYPEDEF __WCHAR_TYPE__ wchar_t;
> +#else
> +TYPEDEF long wchar_t;
> +#endif
> +#endif
> +
> +TYPEDEF float float_t;
> +TYPEDEF double double_t;
> +
> +#if !defined(__cplusplus)
> +TYPEDEF struct { _Alignas(8) long long __ll; long double __ld; } max_align_t;
> +#elif defined(__GNUC__)
> +TYPEDEF struct { __attribute__((__aligned__(8))) long long __ll; long double __ld; } max_align_t;
> +#else
> +TYPEDEF struct { alignas(8) long long __ll; long double __ld; } max_align_t;
> +#endif

As I understand, the Hexagon ABI has all scalar types aligned to their
size, so I don't think the alignas mess is needed here. It should just
work as unconditional:

TYPEDEF struct { long long __ll; long double __ld; } max_align_t;

> diff --git a/arch/hexagon/bits/setjmp.h b/arch/hexagon/bits/setjmp.h
> new file mode 100644
> index 00000000..5ee5e49f
> --- /dev/null
> +++ b/arch/hexagon/bits/setjmp.h
> @@ -0,0 +1,4 @@
> +
> +typedef struct {
> +	long regs[16];
> +} __jmp_buf[1] __attribute__((aligned (8)));

Use long long so no extension is needed here for alignment, and no
wrapper struct (which is not needed, and the member name regs is a
namespace violation):

typedef long long __jmp_buf[8];

> diff --git a/arch/hexagon/bits/signal.h b/arch/hexagon/bits/signal.h
> new file mode 100644
> index 00000000..7a3b36d0
> --- /dev/null
> +++ b/arch/hexagon/bits/signal.h
> @@ -0,0 +1,105 @@
> +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +
> +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +#define MINSIGSTKSZ 2048
> +#define SIGSTKSZ 8192
> +#endif
> +
> +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +typedef int greg_t, gregset_t[18];
> +typedef struct sigcontext
> +{
> +	unsigned long  r0,  r1,  r2,  r3;
> +	unsigned long  r4,  r5,  r6,  r7;
> +	unsigned long  r8,  r9, r10, r11;
> +	unsigned long r12, r13, r14, r15;
> +	unsigned long r16, r17, r18, r19;
> +	unsigned long r20, r21, r22, r23;
> +	unsigned long r24, r25, r26, r27;
> +	unsigned long r28, r29, r30, r31;
> +	unsigned long sa0;
> +	unsigned long lc0;
> +	unsigned long sa1;
> +	unsigned long lc1;
> +	unsigned long m0;
> +	unsigned long m1;
> +	unsigned long usr;
> +	unsigned long p3_0;
> +	unsigned long gp;
> +	unsigned long ugp;
> +	unsigned long pc;
> +	unsigned long cause;
> +	unsigned long badva;
> +	unsigned long pad1;
> +	unsigned long pad2;
> +	unsigned long pad3;
> +} __attribute__((__aligned__(8))) mcontext_t;
> +#else
> +typedef struct {
> +	unsigned long __regs[48];
> +} __attribute__((__aligned__(8))) mcontext_t;
> +#endif

If the pad members are really padding, can the last two just be
replaced by a long long so there's natural alignment with no
extension?

> [...]
> +typedef struct __ucontext {
> +	unsigned long uc_flags;
> +	struct __ucontext *uc_link;
> +	stack_t uc_stack;
> +	mcontext_t uc_mcontext;
> +	sigset_t uc_sigmask;
> +//	unsigned long long uc_regspace[64];
> +} ucontext_t;

Should the commented uc_regspace be removed?

> diff --git a/arch/hexagon/bits/stat.h b/arch/hexagon/bits/stat.h
> new file mode 100644
> index 00000000..55e81fd9
> --- /dev/null
> +++ b/arch/hexagon/bits/stat.h
> @@ -0,0 +1,20 @@
> +/* copied from kernel definition, but with padding replaced
> + * by the corresponding correctly-sized userspace types. */
> +struct stat {
> +	dev_t st_dev;
> +	ino_t st_ino;
> +	mode_t st_mode;
> +	nlink_t st_nlink;
> +	uid_t st_uid;
> +	gid_t st_gid;
> +	dev_t st_rdev;
> +	unsigned long __pad;
> +	off_t st_size;
> +	blksize_t st_blksize;
> +	int __pad2;
> +	blkcnt_t st_blocks;
> +	struct timespec st_atim;
> +	struct timespec st_mtim;
> +	struct timespec st_ctim;
> +	unsigned __unused[2];
> +};

No objection to doing this as-is, but since this looks like the modern
"standard" layout and we could probably deduplicate it by making
arch/generic provide it. That'd be a later patch to remove the other
existing duplicates (aarch64, riscv64, maybe others).

> diff --git a/arch/hexagon/bits/syscall.h.in b/arch/hexagon/bits/syscall.h.in
> new file mode 100644
> index 00000000..77ca3fa0
> --- /dev/null
> +++ b/arch/hexagon/bits/syscall.h.in
> @@ -0,0 +1,317 @@
> [...]
> +#define __NR_timer_create 107
> +#define __NR_timer_gettime 108
> +#define __NR_timer_getoverrun 109
> +#define __NR_timer_settime 110
> +#define __NR_timer_delete 111
> +#define __NR_clock_settime 112
> +#define __NR_clock_gettime 113

Some of these, e.g. clock_gettime, need to be renamed as in commit
5a105f19b5aae79dd302899e634b6b18b3dcd0d6.

> [...]
> +#define __NR_sched_rr_get_interval_time64 423
> +#define __NR_syscalls (__NR_sched_rr_get_interval_time64+1)
> +#define __NR_fcntl  __NR3264_fcntl
> +#define __NR_fstatfs  __NR3264_fstatfs
> +#define __NR_truncate  __NR3264_truncate
> +#define __NR_ftruncate  __NR3264_ftruncate
> +#define __NR_lseek  __NR3264_lseek
> +#define __NR_sendfile  __NR3264_sendfile
> +#define __NR_newfstatat  __NR3264_fstatat
> +#define __NR_fcntl64  __NR3264_fcntl
> +#define __NR_statfs64  __NR3264_statfs
> +#define __NR_fstatfs64  __NR3264_fstatfs
> +#define __NR_truncate64  __NR3264_truncate
> +#define __NR_ftruncate64  __NR3264_ftruncate
> +#define __NR__llseek  __NR3264_lseek
> +#define __NR_sendfile64  __NR3264_sendfile
> +#define __NR_fstatat64  __NR3264_fstatat
> +#define __NR_fstat64  __NR3264_fstat
> +#define __NR_mmap2  __NR3264_mmap
> +#define __NR_fadvise64_64  __NR3264_fadvise64

Is there a reason for these NR3264 redirections rather than just
defining directly with the public names?

> diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h
> new file mode 100644
> index 00000000..331a797e
> --- /dev/null
> +++ b/arch/hexagon/crt_arch.h
> @@ -0,0 +1,35 @@
> +__asm__(
> +".weak _DYNAMIC \n"
> +".hidden _DYNAMIC \n"
> +".text \n"
> +".global " START " \n"
> +".type " START ", %function \n"
> +START ": \n"
> +"                                       // Find _DYNAMIC\n"
> +"       jump 1f\n"
> +".word  _DYNAMIC - .\n"
> +"1:     r2 = pc\n"
> +"       r2 = add(r2, #-4)\n"
> +"       r1 = memw(r2)\n"
> +"       r1 = add(r2, r1)\n"
> +"	r30 = #0			// Signals the end of backtrace\n"
> +"	r0 = r29			// Pointer to argc/argv\n"
> +"	r29 = and(r29, #-16)		// Align\n"
> +"	memw(r29+#-8) = r29\n"
> +"	r29 = add(r29, #-8)\n"
> +"	call " START "_c \n"
> +".size " START ", .-" START "\n"
> +);
> +
> +__asm__(
> +".section \".note.ABI-tag\", \"a\" \n"
> +".align 4 \n"
> +".long 1f - 0f		/* name length */ \n"
> +".long 3f - 2f		/* data length */ \n"
> +".long  1		/* note type */ \n"
> +"0:	.asciz \"GNU\"	/* vendor name seems like this should be MUSL but lldb doesn't agree.*/ \n"
> +"1:	.align 4		 \n"
> +"2:	.long 0	/* linux */ \n"
> +"	.long 3,0,0 \n"
> +"3:	.align 4 \n"
> +);

Is there a reason this needs to be here at all? Shouldn't the tooling
generate it if it's actually wanted/needed?

> diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h
> new file mode 100644
> index 00000000..14085872
> --- /dev/null
> +++ b/arch/hexagon/reloc.h
> @@ -0,0 +1,25 @@
> +#include <endian.h>
> +
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +#define ENDIAN_SUFFIX "el"
> +#else
> +#define ENDIAN_SUFFIX ""
> +#endif

bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally
little endian). Does Hexagon (hw and abi) actually support both byte
orders? If not, I don't think there should be an endian suffix here at
all.

> diff --git a/include/elf.h b/include/elf.h
> index 549f92c1..54251c24 100644
> --- a/include/elf.h
> +++ b/include/elf.h
> @@ -3284,6 +3284,107 @@ enum
>  #define R_RISCV_SET32           56
>  #define R_RISCV_32_PCREL        57
>  
> +#define R_HEX_NONE               0
> [...]

I'd like to merge this separately first since it's independent of
whether hexagon is a supported host.

> diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S
> new file mode 100644
> index 00000000..07b89764
> --- /dev/null
> +++ b/src/fenv/hexagon/fenv.S
> @@ -0,0 +1,143 @@
> +/*
> +The Hexagon user status register includes five status fields which work as
> +sticky flags for the five IEEE-defined exception conditions: inexact, overflow, underflow,
> +divide by zero, and invalid. A sticky flag is set when the corresponding exception occurs,
> +and remains set until explicitly cleared.

Please format for at most 80 columns. (Some source files have a few
longer lines, but if you're flowing text it should be flowed to 80 or
fewer.)

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.