Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <025a01d68f4f$bd211aa0$37634fe0$@codeaurora.org>
Date: Sun, 20 Sep 2020 08:12:47 -0500
From: <sidneym@...eaurora.org>
To: "'Rich Felker'" <dalias@...c.org>
Cc: <musl@...ts.openwall.com>
Subject: RE: Hexagon DSP support



> -----Original Message-----
> From: Rich Felker <dalias@...c.org>
> Sent: Thursday, September 17, 2020 8:08 PM
> To: sidneym@...eaurora.org
> Cc: musl@...ts.openwall.com
> Subject: Re: [musl] Hexagon DSP support
> 
> On Thu, Sep 17, 2020 at 05:31:29PM -0500, sidneym@...eaurora.org wrote:
> >
> >
> > > -----Original Message-----
> > > From: 'Rich Felker' <dalias@...c.org>
> > > Sent: Wednesday, September 16, 2020 8:33 PM
> > > To: musl@...ts.openwall.com
> > > Subject: Re: [musl] Hexagon DSP support
> > >
> > > On Wed, Sep 16, 2020 at 03:49:28PM -0500, sidneym@...eaurora.org
> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: sidneym@...eaurora.org <sidneym@...eaurora.org>
> > > > > Sent: Friday, July 24, 2020 12:50 PM
> > > > > To: 'Szabolcs Nagy' <nsz@...t70.net>
> > > > > Cc: 'Rich Felker' <dalias@...c.org>; 'musl@...ts.openwall.com'
> > > > > <musl@...ts.openwall.com>
> > > > > Subject: RE: [musl] Hexagon DSP support
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Szabolcs Nagy <nsz@...t70.net>
> > > > > > Sent: Thursday, July 23, 2020 4:56 PM
> > > > > > To: sidneym@...eaurora.org
> > > > > > Cc: 'Rich Felker' <dalias@...c.org>; musl@...ts.openwall.com
> > > > > > Subject: Re: [musl] Hexagon DSP support
> > > > > >
> > > > > > * sidneym@...eaurora.org <sidneym@...eaurora.org> [2020-07-
> 20
> > > > > > 16:26:58 -0500]:
> > > > > > > I removed fma/fmal/fmax/fmin/fabs from compiler-rt-builtins,
> > > > > > > https://reviews.llvm.org/D82263 The comparison with musl can
> > > > > > > be found here:
> > > > > > > https://github.com/quic/musl/compare/hexagon but I've also
> > > > > > > attached the patch.
> > > > > > >
> > > > > > > An assert in clang when building both musl and libc-test for
> > > > > > > hexagon was fixed by, https://reviews.llvm.org/D80952 prior
> > > > > > > to this change -frounding-math had to be used.
> > > > > > >
> > > > > > > The test-results are also attached.  Everything is built
> > > > > > > with the tip-of-tree llvm so sometimes results vary but
> > > > > > > these are the results I got from this morning's clone.  The
> > > > > > > only notable difference in the results would be that both
> > > > > > > fma and fmal fail and this is because of the compiler-rt
> > > > > > > change.  I didn't add fma to musl because it require more
> > > > > > > complex assembly, along the lines you saw in an earlier
> > > > > > > version with
> > > > > > sqrt.
> > > > > >
> > > > > >
> > > > > > the fma and sqrt failures are still not fully explained, e.g.
> > > > > > this looks
> > > > wrong:
> > > > > >
> > > > > > src/math/special/fma.h:42: RN fma(0x1p+0,0x1p+0,-0x1p-1074)
> > > > > > want
> > > > > > 0x1p+0 got -0x1.fffffp-43 ulperr -4503599627370496.000 =
> > > > > > 0x1p+-0x1p+52
> > > > > > 0x1p++
> > > > > > 0x0p+0
> > > > > >
> > > > > > the only target specific bit in fma is a_clz_64 so i would
> > > > > > check
> > that.
> > > > > >
> > > > > > e.g. a_clz_64(1ULL << 42) should give 21 (this computation
> > > > > > happens during the fma test case above).
> > > > >
> > > > > Hexagon didn't have a_clz_64 implemented however I added this
> > > > > morning it and noticed no differences.  I will update the patch
> > > > > with that routine included.
> > > > >
> > > > > I did notice a compiler regression in how it compiled fma and
> > > > > have asked a compiler person to take a look.  An older version
> > > > > of our internally
> > > > maintained
> > > > > compiler does produce the expected results for the values I used
> > > > > but later versions do not.  Unfortunately changing optimization
> > > > > levels will produce different results as well.
> > > >
> > > > I've attached updated test results and patch.  The patch doesn't
> > > > change much other than adding the above mentioned a_clz_64.  The
> > > > only other change was an update to pthread_arch.h for an api
> > > > update so hexagon conforms with the rest of musl.
> > > >
> > > > Between updates to llvm and musl both fma and sqrt issues are
> > > > resolved provided I compile the library without optimization
> > > > enabled.  No new tests fail.
> > > >
> > > > I guess I also need to know what the thoughts are about adding
> > > > hexagon to the mainline base.  There are no issues adding from this
> end.
> > >
> > > I'll post some review with the hope that this can move forward
> > > upstream in musl soon. I might need some help figuring out how to
> > > get a cross build environment to check things, but I'll follow up when
I do.
> > >
> > > The review that follows is not 100% thorough but I think it's more
> > detailed
> > > than I've done for hexagon so far. Most of it's open to discussion
> > > if you
> > think
> > > anything I say is wrong.
> > >
> > > > diff --git a/arch/hexagon/atomic_arch.h
> > > > b/arch/hexagon/atomic_arch.h new file mode 100644 index
> > > > 00000000..ede55956
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/atomic_arch.h
> > > > @@ -0,0 +1,194 @@
> > > > +#define a_ctz_32 a_ctz_32
> > > > +static inline int a_ctz_32(unsigned long x) {
> > > > +	__asm__(
> > > > +		"%0 = ct0(%0)\n\t"
> > > > +		: "+r"(x));
> > > > +	return x;
> > > > +}
> > > > +
> > > > +#define a_ctz_64 a_ctz_64
> > > > +static inline int a_ctz_64(uint64_t x) {
> > > > +	int count;
> > > > +	__asm__(
> > > > +		"%0 = ct0(%1)\n\t"
> > > > +		: "=r"(count) : "r"(x));
> > > > +	return count;
> > > > +}
> > > > +#define a_clz_64 a_clz_64
> > > > +static inline int a_clz_64(uint64_t x) {
> > > > +        int count;
> > > > +        __asm__(
> > > > +                "%1 = brev(%1)\n\t"
> > > > +                "%0 = ct0(%1)\n\t"
> > > > +                : "=r"(count) : "r"(x));
> > > > +        return count;
> > > > +}
> > >
> > > This should probably do just the brev in asm then return
> > > a_ctz_64(result) so that the compiler has the freedom to schedule
> > > the
> > insns
> > > independently, unless there's a reason not to want it to do that.
> >
> > I used AARCH64's a_ctz_64 as a reference for this, but there are
> > builtins for these operations and I will use those instead,
> > __builtin_clzll(x);
> 
> No, builtins of that sort aren't used in musl for various reasons.
> This one might be technically okay (as opposed to others that can produce
> circular definitions and such), but the style preference is still to write
it as an
> asm statement.

OK, will restore the inlines.

> 
> > > > +#define a_cas a_cas
> > > > +static inline int a_cas(volatile int *p, int t, int s) {
> > > > +	int dummy;
> > > > +	__asm__ __volatile__(
> > > > +		"1:	%0 = memw_locked(%1)\n\t"
> > > > +		"	{ p0 = cmp.eq(%0, %2)\n\t"
> > > > +		"	  if (!p0.new) jump:nt 2f }\n\t"
> > > > +		"	memw_locked(%1, p0) = %3\n\t"
> > > > +		"	if (!p0) jump 1b\n\t"
> > > > +		"2:	\n\t"
> > > > +		: "=&r"(dummy)
> > > > +		: "r"(p), "r"(t), "r"(s)
> > > > +		: "p0", "memory" );
> > > > +        return dummy;
> > > > +}
> > >
> > > I don't know the hexagon atomic model, but as far as I can tell
> > > these at
> > least
> > > "look right" in the sense of having right asm constraints.
> > >
> > > > [...]
> > > > +#define a_barrier a_barrier
> > > > +static inline void a_barrier()
> > > > +{
> > > > +	__asm__ __volatile__ ("barrier" ::: "memory"); }
> > >
> > > Is the barrier implied in memw_locked? If not, there need to be
> > > explicit barriers in all the atomic functions.
> >
> > Yes, if there is any memory access on the reserved address the
> > reservation is lost and the predicate is false.
> 
> That's not what a barrier means. The question is whether it orders all
access
> to *other* memory, not the address with the reservation on it.
> In other words, musl's a_*() atomics need to be full seq_cst model
> operations, not relaxed atomics.

Per our spec:
"Threads in the Hexagon processor follow a sequentially consistent memory
model at a packet
granularity. Threads interleave their memory operations with one another in
an arbitrary but
fair manner. This results in a consistent program order that is globally
observable by all
threads in the same order."  


> 
> > > > diff --git a/arch/hexagon/crt_arch.h b/arch/hexagon/crt_arch.h new
> > > > file mode 100644 index 00000000..331a797e
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/crt_arch.h
> > > > @@ -0,0 +1,35 @@
> > > > +__asm__(
> > > > +".weak _DYNAMIC \n"
> > > > +".hidden _DYNAMIC \n"
> > > > +".text \n"
> > > > +".global " START " \n"
> > > > +".type " START ", %function \n"
> > > > +START ": \n"
> > > > +"                                       // Find _DYNAMIC\n"
> > > > +"       jump 1f\n"
> > > > +".word  _DYNAMIC - .\n"
> > > > +"1:     r2 = pc\n"
> > > > +"       r2 = add(r2, #-4)\n"
> > > > +"       r1 = memw(r2)\n"
> > > > +"       r1 = add(r2, r1)\n"
> > > > +"	r30 = #0			// Signals the end of
> backtrace\n"
> > > > +"	r0 = r29			// Pointer to argc/argv\n"
> > > > +"	r29 = and(r29, #-16)		// Align\n"
> > > > +"	memw(r29+#-8) = r29\n"
> > > > +"	r29 = add(r29, #-8)\n"
> > > > +"	call " START "_c \n"
> > > > +".size " START ", .-" START "\n"
> > > > +);
> > > > +
> > > > +__asm__(
> > > > +".section \".note.ABI-tag\", \"a\" \n"
> > > > +".align 4 \n"
> > > > +".long 1f - 0f		/* name length */ \n"
> > > > +".long 3f - 2f		/* data length */ \n"
> > > > +".long  1		/* note type */ \n"
> > > > +"0:	.asciz \"GNU\"	/* vendor name seems like this should be
> > MUSL but
> > > lldb doesn't agree.*/ \n"
> > > > +"1:	.align 4		 \n"
> > > > +"2:	.long 0	/* linux */ \n"
> > > > +"	.long 3,0,0 \n"
> > > > +"3:	.align 4 \n"
> > > > +);
> > >
> > > Is there a reason this needs to be here at all? Shouldn't the
> > > tooling
> > generate
> > > it if it's actually wanted/needed?
> >
> > OK, this is here so lldb can select the right target allowing the same
> > version of lldb to work in multiple runtime environments. I need to
> > take a look at what the options are if this isn't a good place for this.
> 
> OK, please follow up with what you find. I'm pretty sure this belongs in
the
> tooling though (as something the compiler emits or the assembler emits
> based on how it's invoked and what its target is) not in a file in libc.
> 

Szabolcs noted that gnu crt adds a similar note.  I know that lldb's
RefineModuleDetailsFromNote uses this to select the OS but will remove it
for now while other options are looked at.


> > > > diff --git a/arch/hexagon/reloc.h b/arch/hexagon/reloc.h new file
> > > > mode
> > > > 100644 index 00000000..14085872
> > > > --- /dev/null
> > > > +++ b/arch/hexagon/reloc.h
> > > > @@ -0,0 +1,25 @@
> > > > +#include <endian.h>
> > > > +
> > > > +#if __BYTE_ORDER == __LITTLE_ENDIAN #define ENDIAN_SUFFIX "el"
> > > > +#else
> > > > +#define ENDIAN_SUFFIX ""
> > > > +#endif
> > >
> > > bits/alltypes.h.in defined __BYTE_ORDER as 1234 (unconditionally
> > > little endian). Does Hexagon (hw and abi) actually support both byte
> > > orders? If not, I don't think there should be an endian suffix here at
all.
> >
> > No it does not I will remove it.
> 
> OK. Make sure the compiler target support has the right ldso name too (I
> think that's /lib/ld-musl-hexagon.so.1, right?).

Yes, that is the name.


> 
> > > > diff --git a/src/fenv/hexagon/fenv.S b/src/fenv/hexagon/fenv.S new
> > > > file mode 100644 index 00000000..07b89764
> > > > --- /dev/null
> > > > +++ b/src/fenv/hexagon/fenv.S
> > > > @@ -0,0 +1,143 @@
> > > > +/*
> > > > +The Hexagon user status register includes five status fields
> > > > +which work as sticky flags for the five IEEE-defined exception
> conditions:
> > > > +inexact, overflow, underflow, divide by zero, and invalid. A
> > > > +sticky flag is set when the corresponding exception occurs, and
> > > > +remains set
> > until
> > > explicitly cleared.
> > >
> > > Please format for at most 80 columns. (Some source files have a few
> > > longer lines, but if you're flowing text it should be flowed to 80
> > > or
> > > fewer.)
> >
> > Will fix those lines and a couple of others I noticed.
> >
> > The changes I made can be seen here:
> >
> https://github.com/quic/musl/commit/4d714f2defcd926b4f8c0425af363b382
> d
> > 3084cb
> >
> > I've also attached an updated patch.
> 
> OK. I haven't looked at it yet but I wanted to go ahead and get you
replies to
> the content in the email body. If there's anything else in the patch that
needs
> commenting on I'll follow up soon, probably tomorrow.
> 
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.