|
Message-ID: <025a01d68f4f$bd211aa0$37634fe0$@codeaurora.org> Date: Sun, 20 Sep 2020 08:12:47 -0500 From: <sidneym@...eaurora.org> To: "'Rich Felker'" <dalias@...c.org> Cc: <musl@...ts.openwall.com> Subject: RE: Hexagon DSP support > -----Original Message----- > From: Rich Felker <dalias@...c.org> > Sent: Thursday, September 17, 2020 8:08 PM > To: sidneym@...eaurora.org > Cc: musl@...ts.openwall.com > Subject: Re: [musl] Hexagon DSP support > > On Thu, Sep 17, 2020 at 05:31:29PM -0500, sidneym@...eaurora.org wrote: > > > > > > > -----Original Message----- > > > From: 'Rich Felker' <dalias@...c.org> > > > Sent: Wednesday, September 16, 2020 8:33 PM > > > To: musl@...ts.openwall.com > > > Subject: Re: [musl] Hexagon DSP support > > > > > > On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@...eaurora.org > wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: sidneym@...eaurora.org <sidneym@...eaurora.org> > > > > > Sent: Friday, July 24, 2020 12:50 PM > > > > > To: 'Szabolcs Nagy' <nsz@...t70.net> > > > > > Cc: 'Rich Felker' <dalias@...c.org>; 'musl@...ts.openwall.com' > > > > > <musl@...ts.openwall.com> > > > > > Subject: RE: [musl] Hexagon DSP support > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Szabolcs Nagy <nsz@...t70.net> > > > > > > Sent: Thursday, July 23, 2020 4:56 PM > > > > > > To: sidneym@...eaurora.org > > > > > > Cc: 'Rich Felker' <dalias@...c.org>; musl@...ts.openwall.com > > > > > > Subject: Re: [musl] Hexagon DSP support > > > > > > > > > > > > * sidneym@...eaurora.org <sidneym@...eaurora.org> [2020-07- > 20 > > > > > > 16:26:58 -0500]: > > > > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins, > > > > > > > https://reviews.llvm.org/D82263 The comparison with musl can > > > > > > > be found here: > > > > > > > https://github.com/quic/musl/compare/hexagon but I've also > > > > > > > attached the patch. > > > > > > > > > > > > > > An assert in clang when building both musl and libc-test for > > > > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior > > > > > > > to this change -frounding-math had to be used. > > > > > > > > > > > > > > The test-results are also attached. Everything is built > > > > > > > with the tip-of-tree llvm so sometimes results vary but > > > > > > > these are the results I got from this morning's clone. The > > > > > > > only notable difference in the results would be that both > > > > > > > fma and fmal fail and this is because of the compiler-rt > > > > > > > change. I didn't add fma to musl because it require more > > > > > > > complex assembly, along the lines you saw in an earlier > > > > > > > version with > > > > > > sqrt. > > > > > > > > > > > > > > > > > > the fma and sqrt failures are still not fully explained, e.g. > > > > > > this looks > > > > wrong: > > > > > > > > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) > > > > > > want > > > > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = > > > > > > 0x1p+-0x1p+52 > > > > > > 0x1p++ > > > > > > 0x0p+0 > > > > > > > > > > > > the only target specific bit in fma is a_clz_64 so i would > > > > > > check > > that. > > > > > > > > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation > > > > > > happens during the fma test case above). > > > > > > > > > > Hexagon didn't have a_clz_64 implemented however I added this > > > > > morning it and noticed no differences. I will update the patch > > > > > with that routine included. > > > > > > > > > > I did notice a compiler regression in how it compiled fma and > > > > > have asked a compiler person to take a look. An older version > > > > > of our internally > > > > maintained > > > > > compiler does produce the expected results for the values I used > > > > > but later versions do not. Unfortunately changing optimization > > > > > levels will produce different results as well. > > > > > > > > I've attached updated test results and patch. The patch doesn't > > > > change much other than adding the above mentioned a_clz_64. The > > > > only other change was an update to pthread_arch.h for an api > > > > update so hexagon conforms with the rest of musl. > > > > > > > > Between updates to llvm and musl both fma and sqrt issues are > > > > resolved provided I compile the library without optimization > > > > enabled. No new tests fail. > > > > > > > > I guess I also need to know what the thoughts are about adding > > > > hexagon to the mainline base. There are no issues adding from this > end. > > > > > > I'll post some review with the hope that this can move forward > > > upstream in musl soon. I might need some help figuring out how to > > > get a cross build environment to check things, but I'll follow up when I do. > > > > > > The review that follows is not 100% thorough but I think it's more > > detailed > > > than I've done for hexagon so far. Most of it's open to discussion > > > if you > > think > > > anything I say is wrong. > > > > > > > diff --git a/arch/hexagon/atomic_arch.h > > > > b/arch/hexagon/atomic_arch.h new file mode 100644 index > > > > 00000000..ede55956 > > > > --- /dev/null > > > > +++ b/arch/hexagon/atomic_arch.h > > > > @@ -0,0 +1,194 @@ > > > > +#define a_ctz_32 a_ctz_32 > > > > +static inline int a_ctz_32(unsigned long x) { > > > > + __asm__( > > > > + "%0 = ct0(%0)\n\t" > > > > + : "+r"(x)); > > > > + return x; > > > > +} > > > > + > > > > +#define a_ctz_64 a_ctz_64 > > > > +static inline int a_ctz_64(uint64_t x) { > > > > + int count; > > > > + __asm__( > > > > + "%0 = ct0(%1)\n\t" > > > > + : "=r"(count) : "r"(x)); > > > > + return count; > > > > +} > > > > +#define a_clz_64 a_clz_64 > > > > +static inline int a_clz_64(uint64_t x) { > > > > + int count; > > > > + __asm__( > > > > + "%1 = brev(%1)\n\t" > > > > + "%0 = ct0(%1)\n\t" > > > > + : "=r"(count) : "r"(x)); > > > > + return count; > > > > +} > > > > > > This should probably do just the brev in asm then return > > > a_ctz_64(result) so that the compiler has the freedom to schedule > > > the > > insns > > > independently, unless there's a reason not to want it to do that. > > > > I used AARCH64's a_ctz_64 as a reference for this, but there are > > builtins for these operations and I will use those instead, > > __builtin_clzll(x); > > No, builtins of that sort aren't used in musl for various reasons. > This one might be technically okay (as opposed to others that can produce > circular definitions and such), but the style preference is still to write it as an > asm statement. OK, will restore the inlines. > > > > > +#define a_cas a_cas > > > > +static inline int a_cas(volatile int *p, int t, int s) { > > > > + int dummy; > > > > + __asm__ __volatile__( > > > > + "1: %0 = memw_locked(%1)\n\t" > > > > + " { p0 = cmp.eq(%0, %2)\n\t" > > > > + " if (!p0.new) jump:nt 2f }\n\t" > > > > + " memw_locked(%1, p0) = %3\n\t" > > > > + " if (!p0) jump 1b\n\t" > > > > + "2: \n\t" > > > > + : "=&r"(dummy) > > > > + : "r"(p), "r"(t), "r"(s) > > > > + : "p0", "memory" ); > > > > + return dummy; > > > > +} > > > > > > I don't know the hexagon atomic model, but as far as I can tell > > > these at > > least > > > "look right" in the sense of having right asm constraints. > > > > > > > [...] > > > > +#define a_barrier a_barrier > > > > +static inline void a_barrier() > > > > +{ > > > > + __asm__ __volatile__ ("barrier" ::: "memory"); } > > > > > > Is the barrier implied in memw_locked? If not, there need to be > > > explicit barriers in all the atomic functions. > > > > Yes, if there is any memory access on the reserved address the > > reservation is lost and the predicate is false. > > That's not what a barrier means. The question is whether it orders all access > to *other* memory, not the address with the reservation on it. > In other words, musl's a_*() atomics need to be full seq_cst model > operations, not relaxed atomics. Per our spec: "Threads in the Hexagon processor follow a sequentially consistent memory model at a packet granularity. Threads interleave their memory operations with one another in an arbitrary but fair manner. This results in a consistent program order that is globally observable by all threads in the same order." > > > > > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new > > > > file mode 100644 index 00000000..331a797e > > > > --- /dev/null > > > > +++ b/arch/hexagon/crt_arch.h > > > > @@ -0,0 +1,35 @@ > > > > +__asm__( > > > > +".weak _DYNAMIC \n" > > > > +".hidden _DYNAMIC \n" > > > > +".text \n" > > > > +".global " START " \n" > > > > +".type " START ", %function \n" > > > > +START ": \n" > > > > +" // Find _DYNAMIC\n" > > > > +" jump 1f\n" > > > > +".word _DYNAMIC - .\n" > > > > +"1: r2 = pc\n" > > > > +" r2 = add(r2, #-4)\n" > > > > +" r1 = memw(r2)\n" > > > > +" r1 = add(r2, r1)\n" > > > > +" r30 = #0 // Signals the end of > backtrace\n" > > > > +" r0 = r29 // Pointer to argc/argv\n" > > > > +" r29 = and(r29, #-16) // Align\n" > > > > +" memw(r29+#-8) = r29\n" > > > > +" r29 = add(r29, #-8)\n" > > > > +" call " START "_c \n" > > > > +".size " START ", .-" START "\n" > > > > +); > > > > + > > > > +__asm__( > > > > +".section \".note.ABI-tag\", \"a\" \n" > > > > +".align 4 \n" > > > > +".long 1f - 0f /* name length */ \n" > > > > +".long 3f - 2f /* data length */ \n" > > > > +".long 1 /* note type */ \n" > > > > +"0: .asciz \"GNU\" /* vendor name seems like this should be > > MUSL but > > > lldb doesn't agree.*/ \n" > > > > +"1: .align 4 \n" > > > > +"2: .long 0 /* linux */ \n" > > > > +" .long 3,0,0 \n" > > > > +"3: .align 4 \n" > > > > +); > > > > > > Is there a reason this needs to be here at all? Shouldn't the > > > tooling > > generate > > > it if it's actually wanted/needed? > > > > OK, this is here so lldb can select the right target allowing the same > > version of lldb to work in multiple runtime environments. I need to > > take a look at what the options are if this isn't a good place for this. > > OK, please follow up with what you find. I'm pretty sure this belongs in the > tooling though (as something the compiler emits or the assembler emits > based on how it's invoked and what its target is) not in a file in libc. > Szabolcs noted that gnu crt adds a similar note. I know that lldb's RefineModuleDetailsFromNote uses this to select the OS but will remove it for now while other options are looked at. > > > > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file > > > > mode > > > > 100644 index 00000000..14085872 > > > > --- /dev/null > > > > +++ b/arch/hexagon/reloc.h > > > > @@ -0,0 +1,25 @@ > > > > +#include <endian.h> > > > > + > > > > +#if __BYTE_ORDER == __LITTLE_ENDIAN #define ENDIAN_SUFFIX "el" > > > > +#else > > > > +#define ENDIAN_SUFFIX "" > > > > +#endif > > > > > > bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally > > > little endian). Does Hexagon (hw and abi) actually support both byte > > > orders? If not, I don't think there should be an endian suffix here at all. > > > > No it does not I will remove it. > > OK. Make sure the compiler target support has the right ldso name too (I > think that's /lib/ld-musl-hexagon.so.1, right?). Yes, that is the name. > > > > > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new > > > > file mode 100644 index 00000000..07b89764 > > > > --- /dev/null > > > > +++ b/src/fenv/hexagon/fenv.S > > > > @@ -0,0 +1,143 @@ > > > > +/* > > > > +The Hexagon user status register includes five status fields > > > > +which work as sticky flags for the five IEEE-defined exception > conditions: > > > > +inexact, overflow, underflow, divide by zero, and invalid. A > > > > +sticky flag is set when the corresponding exception occurs, and > > > > +remains set > > until > > > explicitly cleared. > > > > > > Please format for at most 80 columns. (Some source files have a few > > > longer lines, but if you're flowing text it should be flowed to 80 > > > or > > > fewer.) > > > > Will fix those lines and a couple of others I noticed. > > > > The changes I made can be seen here: > > > https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382 > d > > 3084cb > > > > I've also attached an updated patch. > > OK. I haven't looked at it yet but I wanted to go ahead and get you replies to > the content in the email body. If there's anything else in the patch that needs > commenting on I'll follow up soon, probably tomorrow. > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.