|
Message-ID: <05b901d69020$d1f44c50$75dce4f0$@codeaurora.org> Date: Mon, 21 Sep 2020 09:09:27 -0500 From: <sidneym@...eaurora.org> To: "'Rich Felker'" <dalias@...c.org> Cc: <musl@...ts.openwall.com> Subject: RE: Hexagon DSP support > -----Original Message----- > From: 'Rich Felker' <dalias@...c.org> > Sent: Sunday, September 20, 2020 12:17 PM > To: sidneym@...eaurora.org > Cc: musl@...ts.openwall.com > Subject: Re: [musl] Hexagon DSP support > > On Sun, Sep 20, 2020 at 08:12:47AM -0500, sidneym@...eaurora.org wrote: > > > > > > [...] > > > > > > +#define a_barrier a_barrier > > > > > > +static inline void a_barrier() { > > > > > > + __asm__ __volatile__ ("barrier" ::: "memory"); } > > > > > > > > > > Is the barrier implied in memw_locked? If not, there need to be > > > > > explicit barriers in all the atomic functions. > > > > > > > > Yes, if there is any memory access on the reserved address the > > > > reservation is lost and the predicate is false. > > > > > > That's not what a barrier means. The question is whether it orders > > > all > > access > > > to *other* memory, not the address with the reservation on it. > > > In other words, musl's a_*() atomics need to be full seq_cst model > > > operations, not relaxed atomics. > > > > Per our spec: > > "Threads in the Hexagon processor follow a sequentially consistent > > memory model at a packet granularity. Threads interleave their memory > > operations with one another in an arbitrary but fair manner. This > > results in a consistent program order that is globally observable by > > all threads in the same order." > > Can you clarify or provide a reference for what 'packet granularity' > means? If there's actually a full builtin seq_cst order I don't see what the > barrier instruction exists for to begin with. > Packet granularity is like instruction granularity, every operation within the packet happens in parallel. There is an exception since packets can have dual stores. They happen in a prescribed order. A packet has 4 slots but dual stores must be in slots 0 & 1. Stores in slot 1 happen before stores in slot 0. Slot 0 is the highest address in the packet so the store order would appear as it would if you disassembled the code. barrier is used for thread-to-external-memory. All observers in the "global shared domain" would see the store after the barrier finished. "For devices external to the Hexagon processor, the processor follows a weakly-ordered memory model. Explicit synchronization is required to ensure order between memory accesses." > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.