|
Message-ID: <20200918010806.GX3265@brightrain.aerifal.cx> Date: Thu, 17 Sep 2020 21:08:07 -0400 From: Rich Felker <dalias@...c.org> To: sidneym@...eaurora.org Cc: musl@...ts.openwall.com Subject: Re: Hexagon DSP support On Thu, Sep 17, 2020 at 05:31:29PM -0500, sidneym@...eaurora.org wrote: > > > > -----Original Message----- > > From: 'Rich Felker' <dalias@...c.org> > > Sent: Wednesday, September 16, 2020 8:33 PM > > To: musl@...ts.openwall.com > > Subject: Re: [musl] Hexagon DSP support > > > > On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@...eaurora.org wrote: > > > > > > > > > > -----Original Message----- > > > > From: sidneym@...eaurora.org <sidneym@...eaurora.org> > > > > Sent: Friday, July 24, 2020 12:50 PM > > > > To: 'Szabolcs Nagy' <nsz@...t70.net> > > > > Cc: 'Rich Felker' <dalias@...c.org>; 'musl@...ts.openwall.com' > > > > <musl@...ts.openwall.com> > > > > Subject: RE: [musl] Hexagon DSP support > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Szabolcs Nagy <nsz@...t70.net> > > > > > Sent: Thursday, July 23, 2020 4:56 PM > > > > > To: sidneym@...eaurora.org > > > > > Cc: 'Rich Felker' <dalias@...c.org>; musl@...ts.openwall.com > > > > > Subject: Re: [musl] Hexagon DSP support > > > > > > > > > > * sidneym@...eaurora.org <sidneym@...eaurora.org> [2020-07-20 > > > > > 16:26:58 -0500]: > > > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins, > > > > > > https://reviews.llvm.org/D82263 > > > > > > The comparison with musl can be found here: > > > > > > https://github.com/quic/musl/compare/hexagon but I've also > > > > > > attached the patch. > > > > > > > > > > > > An assert in clang when building both musl and libc-test for > > > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior to > > > > > > this change -frounding-math had to be used. > > > > > > > > > > > > The test-results are also attached. Everything is built with > > > > > > the tip-of-tree llvm so sometimes results vary but these are the > > > > > > results I got from this morning's clone. The only notable > > > > > > difference in the results would be that both fma and fmal fail > > > > > > and this is because of the compiler-rt change. I didn't add fma > > > > > > to musl because it require more complex assembly, along the > > > > > > lines you saw in an earlier version with > > > > > sqrt. > > > > > > > > > > > > > > > the fma and sqrt failures are still not fully explained, e.g. this > > > > > looks > > > wrong: > > > > > > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074) want > > > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 = -0x1p+52 > > > > > 0x1p++ > > > > > 0x0p+0 > > > > > > > > > > the only target specific bit in fma is a_clz_64 so i would check > that. > > > > > > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation happens > > > > > during the fma test case above). > > > > > > > > Hexagon didn't have a_clz_64 implemented however I added this > > > > morning it and noticed no differences. I will update the patch with > > > > that routine included. > > > > > > > > I did notice a compiler regression in how it compiled fma and have > > > > asked a compiler person to take a look. An older version of our > > > > internally > > > maintained > > > > compiler does produce the expected results for the values I used but > > > > later versions do not. Unfortunately changing optimization levels > > > > will produce different results as well. > > > > > > I've attached updated test results and patch. The patch doesn't > > > change much other than adding the above mentioned a_clz_64. The only > > > other change was an update to pthread_arch.h for an api update so > > > hexagon conforms with the rest of musl. > > > > > > Between updates to llvm and musl both fma and sqrt issues are resolved > > > provided I compile the library without optimization enabled. No new > > > tests fail. > > > > > > I guess I also need to know what the thoughts are about adding hexagon > > > to the mainline base. There are no issues adding from this end. > > > > I'll post some review with the hope that this can move forward upstream in > > musl soon. I might need some help figuring out how to get a cross build > > environment to check things, but I'll follow up when I do. > > > > The review that follows is not 100% thorough but I think it's more > detailed > > than I've done for hexagon so far. Most of it's open to discussion if you > think > > anything I say is wrong. > > > > > diff --git a/arch/hexagon/atomic_arch.h b/arch/hexagon/atomic_arch.h > > > new file mode 100644 index 00000000..ede55956 > > > --- /dev/null > > > +++ b/arch/hexagon/atomic_arch.h > > > @@ -0,0 +1,194 @@ > > > +#define a_ctz_32 a_ctz_32 > > > +static inline int a_ctz_32(unsigned long x) { > > > + __asm__( > > > + "%0 = ct0(%0)\n\t" > > > + : "+r"(x)); > > > + return x; > > > +} > > > + > > > +#define a_ctz_64 a_ctz_64 > > > +static inline int a_ctz_64(uint64_t x) { > > > + int count; > > > + __asm__( > > > + "%0 = ct0(%1)\n\t" > > > + : "=r"(count) : "r"(x)); > > > + return count; > > > +} > > > +#define a_clz_64 a_clz_64 > > > +static inline int a_clz_64(uint64_t x) { > > > + int count; > > > + __asm__( > > > + "%1 = brev(%1)\n\t" > > > + "%0 = ct0(%1)\n\t" > > > + : "=r"(count) : "r"(x)); > > > + return count; > > > +} > > > > This should probably do just the brev in asm then return > > a_ctz_64(result) so that the compiler has the freedom to schedule the > insns > > independently, unless there's a reason not to want it to do that. > > I used AARCH64's a_ctz_64 as a reference for this, but there are builtins > for these operations and I will use those instead, __builtin_clzll(x); No, builtins of that sort aren't used in musl for various reasons. This one might be technically okay (as opposed to others that can produce circular definitions and such), but the style preference is still to write it as an asm statement. > > > +#define a_cas a_cas > > > +static inline int a_cas(volatile int *p, int t, int s) { > > > + int dummy; > > > + __asm__ __volatile__( > > > + "1: %0 = memw_locked(%1)\n\t" > > > + " { p0 = cmp.eq(%0, %2)\n\t" > > > + " if (!p0.new) jump:nt 2f }\n\t" > > > + " memw_locked(%1, p0) = %3\n\t" > > > + " if (!p0) jump 1b\n\t" > > > + "2: \n\t" > > > + : "=&r"(dummy) > > > + : "r"(p), "r"(t), "r"(s) > > > + : "p0", "memory" ); > > > + return dummy; > > > +} > > > > I don't know the hexagon atomic model, but as far as I can tell these at > least > > "look right" in the sense of having right asm constraints. > > > > > [...] > > > +#define a_barrier a_barrier > > > +static inline void a_barrier() > > > +{ > > > + __asm__ __volatile__ ("barrier" ::: "memory"); } > > > > Is the barrier implied in memw_locked? If not, there need to be explicit > > barriers in all the atomic functions. > > Yes, if there is any memory access on the reserved address the reservation > is lost and the predicate is false. That's not what a barrier means. The question is whether it orders all access to *other* memory, not the address with the reservation on it. In other words, musl's a_*() atomics need to be full seq_cst model operations, not relaxed atomics. > > > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new > > > file mode 100644 index 00000000..331a797e > > > --- /dev/null > > > +++ b/arch/hexagon/crt_arch.h > > > @@ -0,0 +1,35 @@ > > > +__asm__( > > > +".weak _DYNAMIC \n" > > > +".hidden _DYNAMIC \n" > > > +".text \n" > > > +".global " START " \n" > > > +".type " START ", %function \n" > > > +START ": \n" > > > +" // Find _DYNAMIC\n" > > > +" jump 1f\n" > > > +".word _DYNAMIC - .\n" > > > +"1: r2 = pc\n" > > > +" r2 = add(r2, #-4)\n" > > > +" r1 = memw(r2)\n" > > > +" r1 = add(r2, r1)\n" > > > +" r30 = #0 // Signals the end of backtrace\n" > > > +" r0 = r29 // Pointer to argc/argv\n" > > > +" r29 = and(r29, #-16) // Align\n" > > > +" memw(r29+#-8) = r29\n" > > > +" r29 = add(r29, #-8)\n" > > > +" call " START "_c \n" > > > +".size " START ", .-" START "\n" > > > +); > > > + > > > +__asm__( > > > +".section \".note.ABI-tag\", \"a\" \n" > > > +".align 4 \n" > > > +".long 1f - 0f /* name length */ \n" > > > +".long 3f - 2f /* data length */ \n" > > > +".long 1 /* note type */ \n" > > > +"0: .asciz \"GNU\" /* vendor name seems like this should be > MUSL but > > lldb doesn't agree.*/ \n" > > > +"1: .align 4 \n" > > > +"2: .long 0 /* linux */ \n" > > > +" .long 3,0,0 \n" > > > +"3: .align 4 \n" > > > +); > > > > Is there a reason this needs to be here at all? Shouldn't the tooling > generate > > it if it's actually wanted/needed? > > OK, this is here so lldb can select the right target allowing the same > version of lldb to work in multiple runtime environments. I need to take a > look at what the options are if this isn't a good place for this. OK, please follow up with what you find. I'm pretty sure this belongs in the tooling though (as something the compiler emits or the assembler emits based on how it's invoked and what its target is) not in a file in libc. > > > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file mode > > > 100644 index 00000000..14085872 > > > --- /dev/null > > > +++ b/arch/hexagon/reloc.h > > > @@ -0,0 +1,25 @@ > > > +#include <endian.h> > > > + > > > +#if __BYTE_ORDER == __LITTLE_ENDIAN > > > +#define ENDIAN_SUFFIX "el" > > > +#else > > > +#define ENDIAN_SUFFIX "" > > > +#endif > > > > bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally little > > endian). Does Hexagon (hw and abi) actually support both byte orders? If > > not, I don't think there should be an endian suffix here at all. > > No it does not I will remove it. OK. Make sure the compiler target support has the right ldso name too (I think that's /lib/ld-musl-hexagon.so.1, right?). > > > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new > > > file mode 100644 index 00000000..07b89764 > > > --- /dev/null > > > +++ b/src/fenv/hexagon/fenv.S > > > @@ -0,0 +1,143 @@ > > > +/* > > > +The Hexagon user status register includes five status fields which > > > +work as sticky flags for the five IEEE-defined exception conditions: > > > +inexact, overflow, underflow, divide by zero, and invalid. A sticky > > > +flag is set when the corresponding exception occurs, and remains set > until > > explicitly cleared. > > > > Please format for at most 80 columns. (Some source files have a few longer > > lines, but if you're flowing text it should be flowed to 80 or > > fewer.) > > Will fix those lines and a couple of others I noticed. > > The changes I made can be seen here: > https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382d3084cb > > I've also attached an updated patch. OK. I haven't looked at it yet but I wanted to go ahead and get you replies to the content in the email body. If there's anything else in the patch that needs commenting on I'll follow up soon, probably tomorrow. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.