|
Message-ID: <20200917013236.GV3265@brightrain.aerifal.cx> Date: Wed, 16 Sep 2020 21:32:36 -0400 From: 'Rich Felker' <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Hexagon DSP support On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@...eaurora.org wrote: > > > > -----Original Message----- > > From: sidneym@...eaurora.org <sidneym@...eaurora.org> > > Sent: Friday, July 24, 2020 12:50 PM > > To: 'Szabolcs Nagy' <nsz@...t70.net> > > Cc: 'Rich Felker' <dalias@...c.org>; 'musl@...ts.openwall.com' > > <musl@...ts.openwall.com> > > Subject: RE: [musl] Hexagon DSP support > > > > > > > > > -----Original Message----- > > > From: Szabolcs Nagy <nsz@...t70.net> > > > Sent: Thursday, July 23, 2020 4:56 PM > > > To: sidneym@...eaurora.org > > > Cc: 'Rich Felker' <dalias@...c.org>; musl@...ts.openwall.com > > > Subject: Re: [musl] Hexagon DSP support > > > > > > * sidneym@...eaurora.org <sidneym@...eaurora.org> [2020-07-20 > > > 16:26:58 -0500]: > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins, > > > > https://reviews.llvm.org/D82263 > > > > The comparison with musl can be found here: > > > > https://github.com/quic/musl/compare/hexagon but I've also attached > > > > the patch. > > > > > > > > An assert in clang when building both musl and libc-test for hexagon > > > > was fixed by, https://reviews.llvm.org/D80952 prior to this change > > > > -frounding-math had to be used. > > > > > > > > The test-results are also attached. Everything is built with the > > > > tip-of-tree llvm so sometimes results vary but these are the results > > > > I got from this morning's clone. The only notable difference in the > > > > results would be that both fma and fmal fail and this is because of > > > > the compiler-rt change. I didn't add fma to musl because it require > > > > more complex assembly, along the lines you saw in an earlier version > > > > with > > > sqrt. > > > > > > > > > the fma and sqrt failures are still not fully explained, e.g. this looks > wrong: > > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) want > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = -0x1p+52 + > > > 0x0p+0 > > > > > > the only target specific bit in fma is a_clz_64 so i would check that. > > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation happens > > > during the fma test case above). > > > > Hexagon didn't have a_clz_64 implemented however I added this morning it > > and noticed no differences. I will update the patch with that routine > > included. > > > > I did notice a compiler regression in how it compiled fma and have asked a > > compiler person to take a look. An older version of our internally > maintained > > compiler does produce the expected results for the values I used but later > > versions do not. Unfortunately changing optimization levels will produce > > different results as well. > > I've attached updated test results and patch. The patch doesn't change much > other than adding the above mentioned a_clz_64. The only other change was > an update to pthread_arch.h for an api update so hexagon conforms with the > rest of musl. > > Between updates to llvm and musl both fma and sqrt issues are resolved > provided I compile the library without optimization enabled. No new tests > fail. > > I guess I also need to know what the thoughts are about adding hexagon to > the mainline base. There are no issues adding from this end. I'll post some review with the hope that this can move forward upstream in musl soon. I might need some help figuring out how to get a cross build environment to check things, but I'll follow up when I do. The review that follows is not 100% thorough but I think it's more detailed than I've done for hexagon so far. Most of it's open to discussion if you think anything I say is wrong. > diff --git a/arch/hexagon/atomic_arch.h b/arch/hexagon/atomic_arch.h > new file mode 100644 > index 00000000..ede55956 > --- /dev/null > +++ b/arch/hexagon/atomic_arch.h > @@ -0,0 +1,194 @@ > +#define a_ctz_32 a_ctz_32 > +static inline int a_ctz_32(unsigned long x) > +{ > + __asm__( > + "%0 = ct0(%0)\n\t" > + : "+r"(x)); > + return x; > +} > + > +#define a_ctz_64 a_ctz_64 > +static inline int a_ctz_64(uint64_t x) > +{ > + int count; > + __asm__( > + "%0 = ct0(%1)\n\t" > + : "=r"(count) : "r"(x)); > + return count; > +} > +#define a_clz_64 a_clz_64 > +static inline int a_clz_64(uint64_t x) > +{ > + int count; > + __asm__( > + "%1 = brev(%1)\n\t" > + "%0 = ct0(%1)\n\t" > + : "=r"(count) : "r"(x)); > + return count; > +} This should probably do just the brev in asm then return a_ctz_64(result) so that the compiler has the freedom to schedule the insns independently, unless there's a reason not to want it to do that. > +#define a_cas a_cas > +static inline int a_cas(volatile int *p, int t, int s) > +{ > + int dummy; > + __asm__ __volatile__( > + "1: %0 = memw_locked(%1)\n\t" > + " { p0 = cmp.eq(%0, %2)\n\t" > + " if (!p0.new) jump:nt 2f }\n\t" > + " memw_locked(%1, p0) = %3\n\t" > + " if (!p0) jump 1b\n\t" > + "2: \n\t" > + : "=&r"(dummy) > + : "r"(p), "r"(t), "r"(s) > + : "p0", "memory" ); > + return dummy; > +} I don't know the hexagon atomic model, but as far as I can tell these at least "look right" in the sense of having right asm constraints. > [...] > +#define a_barrier a_barrier > +static inline void a_barrier() > +{ > + __asm__ __volatile__ ("barrier" ::: "memory"); > +} Is the barrier implied in memw_locked? If not, there need to be explicit barriers in all the atomic functions. > diff --git a/arch/hexagon/bits/alltypes.h.in b/arch/hexagon/bits/alltypes.h.in > new file mode 100644 > index 00000000..9d770c7e > --- /dev/null > +++ b/arch/hexagon/bits/alltypes.h.in > @@ -0,0 +1,26 @@ > +#define _Addr int > +#define _Int64 long long > +#define _Reg int > + > +#define __BYTE_ORDER 1234 > +#define __LONG_MAX 0x7fffffffL > + > +#ifndef __cplusplus > +#ifdef __WCHAR_TYPE__ > +TYPEDEF __WCHAR_TYPE__ wchar_t; > +#else > +TYPEDEF long wchar_t; > +#endif > +#endif > + > +TYPEDEF float float_t; > +TYPEDEF double double_t; > + > +#if !defined(__cplusplus) > +TYPEDEF struct { _Alignas(8) long long __ll; long double __ld; } max_align_t; > +#elif defined(__GNUC__) > +TYPEDEF struct { __attribute__((__aligned__(8))) long long __ll; long double __ld; } max_align_t; > +#else > +TYPEDEF struct { alignas(8) long long __ll; long double __ld; } max_align_t; > +#endif As I understand, the Hexagon ABI has all scalar types aligned to their size, so I don't think the alignas mess is needed here. It should just work as unconditional: TYPEDEF struct { long long __ll; long double __ld; } max_align_t; > diff --git a/arch/hexagon/bits/setjmp.h b/arch/hexagon/bits/setjmp.h > new file mode 100644 > index 00000000..5ee5e49f > --- /dev/null > +++ b/arch/hexagon/bits/setjmp.h > @@ -0,0 +1,4 @@ > + > +typedef struct { > + long regs[16]; > +} __jmp_buf[1] __attribute__((aligned (8))); Use long long so no extension is needed here for alignment, and no wrapper struct (which is not needed, and the member name regs is a namespace violation): typedef long long __jmp_buf[8]; > diff --git a/arch/hexagon/bits/signal.h b/arch/hexagon/bits/signal.h > new file mode 100644 > index 00000000..7a3b36d0 > --- /dev/null > +++ b/arch/hexagon/bits/signal.h > @@ -0,0 +1,105 @@ > +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \ > + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > + > +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > +#define MINSIGSTKSZ 2048 > +#define SIGSTKSZ 8192 > +#endif > + > +#if defined(_GNU_SOURCE) || defined(_BSD_SOURCE) > +typedef int greg_t, gregset_t[18]; > +typedef struct sigcontext > +{ > + unsigned long r0, r1, r2, r3; > + unsigned long r4, r5, r6, r7; > + unsigned long r8, r9, r10, r11; > + unsigned long r12, r13, r14, r15; > + unsigned long r16, r17, r18, r19; > + unsigned long r20, r21, r22, r23; > + unsigned long r24, r25, r26, r27; > + unsigned long r28, r29, r30, r31; > + unsigned long sa0; > + unsigned long lc0; > + unsigned long sa1; > + unsigned long lc1; > + unsigned long m0; > + unsigned long m1; > + unsigned long usr; > + unsigned long p3_0; > + unsigned long gp; > + unsigned long ugp; > + unsigned long pc; > + unsigned long cause; > + unsigned long badva; > + unsigned long pad1; > + unsigned long pad2; > + unsigned long pad3; > +} __attribute__((__aligned__(8))) mcontext_t; > +#else > +typedef struct { > + unsigned long __regs[48]; > +} __attribute__((__aligned__(8))) mcontext_t; > +#endif If the pad members are really padding, can the last two just be replaced by a long long so there's natural alignment with no extension? > [...] > +typedef struct __ucontext { > + unsigned long uc_flags; > + struct __ucontext *uc_link; > + stack_t uc_stack; > + mcontext_t uc_mcontext; > + sigset_t uc_sigmask; > +// unsigned long long uc_regspace[64]; > +} ucontext_t; Should the commented uc_regspace be removed? > diff --git a/arch/hexagon/bits/stat.h b/arch/hexagon/bits/stat.h > new file mode 100644 > index 00000000..55e81fd9 > --- /dev/null > +++ b/arch/hexagon/bits/stat.h > @@ -0,0 +1,20 @@ > +/* copied from kernel definition, but with padding replaced > + * by the corresponding correctly-sized userspace types. */ > +struct stat { > + dev_t st_dev; > + ino_t st_ino; > + mode_t st_mode; > + nlink_t st_nlink; > + uid_t st_uid; > + gid_t st_gid; > + dev_t st_rdev; > + unsigned long __pad; > + off_t st_size; > + blksize_t st_blksize; > + int __pad2; > + blkcnt_t st_blocks; > + struct timespec st_atim; > + struct timespec st_mtim; > + struct timespec st_ctim; > + unsigned __unused[2]; > +}; No objection to doing this as-is, but since this looks like the modern "standard" layout and we could probably deduplicate it by making arch/generic provide it. That'd be a later patch to remove the other existing duplicates (aarch64, riscv64, maybe others). > diff --git a/arch/hexagon/bits/syscall.h.in b/arch/hexagon/bits/syscall.h.in > new file mode 100644 > index 00000000..77ca3fa0 > --- /dev/null > +++ b/arch/hexagon/bits/syscall.h.in > @@ -0,0 +1,317 @@ > [...] > +#define __NR_timer_create 107 > +#define __NR_timer_gettime 108 > +#define __NR_timer_getoverrun 109 > +#define __NR_timer_settime 110 > +#define __NR_timer_delete 111 > +#define __NR_clock_settime 112 > +#define __NR_clock_gettime 113 Some of these, e.g. clock_gettime, need to be renamed as in commit 5a105f19b5aae79dd302899e634b6b18b3dcd0d6. > [...] > +#define __NR_sched_rr_get_interval_time64 423 > +#define __NR_syscalls (__NR_sched_rr_get_interval_time64+1) > +#define __NR_fcntl __NR3264_fcntl > +#define __NR_fstatfs __NR3264_fstatfs > +#define __NR_truncate __NR3264_truncate > +#define __NR_ftruncate __NR3264_ftruncate > +#define __NR_lseek __NR3264_lseek > +#define __NR_sendfile __NR3264_sendfile > +#define __NR_newfstatat __NR3264_fstatat > +#define __NR_fcntl64 __NR3264_fcntl > +#define __NR_statfs64 __NR3264_statfs > +#define __NR_fstatfs64 __NR3264_fstatfs > +#define __NR_truncate64 __NR3264_truncate > +#define __NR_ftruncate64 __NR3264_ftruncate > +#define __NR__llseek __NR3264_lseek > +#define __NR_sendfile64 __NR3264_sendfile > +#define __NR_fstatat64 __NR3264_fstatat > +#define __NR_fstat64 __NR3264_fstat > +#define __NR_mmap2 __NR3264_mmap > +#define __NR_fadvise64_64 __NR3264_fadvise64 Is there a reason for these NR3264 redirections rather than just defining directly with the public names? > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h > new file mode 100644 > index 00000000..331a797e > --- /dev/null > +++ b/arch/hexagon/crt_arch.h > @@ -0,0 +1,35 @@ > +__asm__( > +".weak _DYNAMIC \n" > +".hidden _DYNAMIC \n" > +".text \n" > +".global " START " \n" > +".type " START ", %function \n" > +START ": \n" > +" // Find _DYNAMIC\n" > +" jump 1f\n" > +".word _DYNAMIC - .\n" > +"1: r2 = pc\n" > +" r2 = add(r2, #-4)\n" > +" r1 = memw(r2)\n" > +" r1 = add(r2, r1)\n" > +" r30 = #0 // Signals the end of backtrace\n" > +" r0 = r29 // Pointer to argc/argv\n" > +" r29 = and(r29, #-16) // Align\n" > +" memw(r29+#-8) = r29\n" > +" r29 = add(r29, #-8)\n" > +" call " START "_c \n" > +".size " START ", .-" START "\n" > +); > + > +__asm__( > +".section \".note.ABI-tag\", \"a\" \n" > +".align 4 \n" > +".long 1f - 0f /* name length */ \n" > +".long 3f - 2f /* data length */ \n" > +".long 1 /* note type */ \n" > +"0: .asciz \"GNU\" /* vendor name seems like this should be MUSL but lldb doesn't agree.*/ \n" > +"1: .align 4 \n" > +"2: .long 0 /* linux */ \n" > +" .long 3,0,0 \n" > +"3: .align 4 \n" > +); Is there a reason this needs to be here at all? Shouldn't the tooling generate it if it's actually wanted/needed? > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h > new file mode 100644 > index 00000000..14085872 > --- /dev/null > +++ b/arch/hexagon/reloc.h > @@ -0,0 +1,25 @@ > +#include <endian.h> > + > +#if __BYTE_ORDER == __LITTLE_ENDIAN > +#define ENDIAN_SUFFIX "el" > +#else > +#define ENDIAN_SUFFIX "" > +#endif bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally little endian). Does Hexagon (hw and abi) actually support both byte orders? If not, I don't think there should be an endian suffix here at all. > diff --git a/include/elf.h b/include/elf.h > index 549f92c1..54251c24 100644 > --- a/include/elf.h > +++ b/include/elf.h > @@ -3284,6 +3284,107 @@ enum > #define R_RISCV_SET32 56 > #define R_RISCV_32_PCREL 57 > > +#define R_HEX_NONE 0 > [...] I'd like to merge this separately first since it's independent of whether hexagon is a supported host. > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S > new file mode 100644 > index 00000000..07b89764 > --- /dev/null > +++ b/src/fenv/hexagon/fenv.S > @@ -0,0 +1,143 @@ > +/* > +The Hexagon user status register includes five status fields which work as > +sticky flags for the five IEEE-defined exception conditions: inexact, overflow, underflow, > +divide by zero, and invalid. A sticky flag is set when the corresponding exception occurs, > +and remains set until explicitly cleared. Please format for at most 80 columns. (Some source files have a few longer lines, but if you're flowing text it should be flowed to 80 or fewer.) Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.