|
Message-ID: <20150518023402.GS17573@brightrain.aerifal.cx> Date: Sun, 17 May 2015 22:34:02 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH] inline llsc atomics when compiling for sh4a On Sun, May 17, 2015 at 01:55:16PM -0500, Bobby Bingham wrote: > If we're building for sh4a, the compiler is already free to use > instructions only available on sh4a, so we can do the same and inline the > llsc atomics. If we're building for an older processor, we still do the > same runtime atomics selection as before. Thanks! I think it's ok for commit as-is, but based on re-reading this code I have some ideas for improving it that are orthogonal to this change. See comments inline: > --- > arch/sh/atomic.h | 83 +++++++++++++++++++++++++++++++ > arch/sh/src/atomic.c | 135 +++++++++++++++++---------------------------------- > 2 files changed, 128 insertions(+), 90 deletions(-) > > diff --git a/arch/sh/atomic.h b/arch/sh/atomic.h > index a1d22e4..f2e6dac 100644 > --- a/arch/sh/atomic.h > +++ b/arch/sh/atomic.h > @@ -22,6 +22,88 @@ static inline int a_ctz_64(uint64_t x) > return a_ctz_l(y); > } > > +#define LLSC_CLOBBERS "r0", "t", "memory" > +#define LLSC_START(mem) "synco\n" \ > + "0: movli.l @" mem ", r0\n" > +#define LLSC_END(mem) \ > + "1: movco.l r0, @" mem "\n" \ > + " bf 0b\n" \ > + " synco\n" > + > +static inline int __sh_cas_llsc(volatile int *p, int t, int s) > +{ > + int old; > + __asm__ __volatile__( > + LLSC_START("%1") > + " mov r0, %0\n" > + " cmp/eq %0, %2\n" > + " bf 1f\n" > + " mov %3, r0\n" > + LLSC_END("%1") > + : "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS); > + return old; > +} The mov from r0 to %0 seems unnecessary here. Presumably it's because you didn't have a constraint to force old to come out via r0. Could you do the following? static inline int __sh_cas_llsc(volatile int *p, int t, int s) { register int old __asm__("r0"); __asm__ __volatile__( LLSC_START("%1") " cmp/eq r0, %2\n" " bf 1f\n" " mov %3, r0\n" LLSC_END("%1") : "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory"); return old; } Note that I had to drop LLSC_CLOBBERS because "r0" was in it and you can't clobber an output register. I've actually always wondered about the value of having the LLSC_* macros. I usually prefer for the whole asm to be written out explicitly and readable unless there's a compelling reason to wrap it up in macros. Then it would look like: static inline int __sh_cas_llsc(volatile int *p, int t, int s) { register int old __asm__("r0"); __asm__ __volatile__( " synco\n" "0: movli.l @%1, r0\n" " cmp/eq r0, %2\n" " bf 1f\n" " mov %3, r0\n" "1: movco.l r0, @%1\n" " bf 0b\n" " synco\n" : "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory"); return old; } and similar for other functions. Part of the motivation of not hiding the outer logic in macros is that it might make it possible to fold/simplify some special cases like I did above for CAS. Another idea is letting the compiler simplify, with something like the following, which could actually be used cross-platform for all llsc-type archs: static inline int __sh_cas_llsc(volatile int *p, int t, int s) { do old = llsc_start(p); while (*p == t && !llsc_end(p, s)); return old; } Here llsc_start and llsc_end would be inline functions using asm with appropriate constraints. Unfortunately I don't see a way to model using the value of the truth flag "t" as the output of the asm for llsc_end, though. I suspect this would be a problem on a number of other archs too; the asm would have to waste an instruction (or several) converting the flag to an integer. Unless there's a solution to that problem, it makes an approach like this less appealing. > int __sh_cas(volatile int *p, int t, int s) > { > + if (__hwcap & CPU_HAS_LLSC) return __sh_cas_llsc(p, t, s); > + > int old; > - if (__hwcap & CPU_HAS_LLSC) { > - __asm__ __volatile__( > - LLSC_START("%1") > - " mov r0, %0\n" > - " cmp/eq %0, %2\n" > - " bf 1f\n" > - " mov %3, r0\n" > - LLSC_END("%1") > - : "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS); > - } else { > - __asm__ __volatile__( > - GUSA_START_EVEN("%1", "%0") > - " cmp/eq %0, %2\n" > - " bf 1f\n" > - GUSA_END("%1", "%3") > - : "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t"); > - } > + __asm__ __volatile__( > + GUSA_START_EVEN("%1", "%0") > + " cmp/eq %0, %2\n" > + " bf 1f\n" > + GUSA_END("%1", "%3") > + : "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t"); > return old; > } For the GUSA stuff, do you really nexed the ODD/EVEN macros? I think you could just add appropriate .align inside that would cause the assembler to insert a nop if necessary. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.