|
Message-ID: <20190425085353.GI26605@port70.net> Date: Thu, 25 Apr 2019 10:53:54 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: musl@...ts.openwall.com Subject: Re: [PATCH] x86: optimize fp_arch.h * Rich Felker <dalias@...c.org> [2019-04-24 22:01:08 -0400]: > On Thu, Apr 25, 2019 at 01:51:06AM +0200, Szabolcs Nagy wrote: > > tested on x86_64 and i386 > > > >From 5f97370ff3e94bea812ec123a31d7482965a3b1b Mon Sep 17 00:00:00 2001 > > From: Szabolcs Nagy <nsz@...t70.net> > > Date: Wed, 24 Apr 2019 23:29:05 +0000 > > Subject: [PATCH] x86: optimize fp_arch.h > > > > Use fp register constraint instead of volatile store when sse2 math is > > available, and use memory constraint when only x87 fpu is available. > > --- > > arch/i386/fp_arch.h | 31 +++++++++++++++++++++++++++++++ > > arch/x32/fp_arch.h | 25 +++++++++++++++++++++++++ > > arch/x86_64/fp_arch.h | 25 +++++++++++++++++++++++++ > > 3 files changed, 81 insertions(+) > > create mode 100644 arch/i386/fp_arch.h > > create mode 100644 arch/x32/fp_arch.h > > create mode 100644 arch/x86_64/fp_arch.h > > > > diff --git a/arch/i386/fp_arch.h b/arch/i386/fp_arch.h > > new file mode 100644 > > index 00000000..b4019de2 > > --- /dev/null > > +++ b/arch/i386/fp_arch.h > > @@ -0,0 +1,31 @@ > > +#ifdef __SSE2_MATH__ > > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+x"(x)) > > +#else > > +#define FP_BARRIER(x) __asm__ __volatile__ ("" : "+m"(x)) > > +#endif > > I guess for float and double you need the "m" constraint to ensure > that a broken compiler doesn't skip dropping of precision (although I > still wish we didn't bother with complexity to support that, and just > relied on cast working correctly), but at least for long double > couldn't we use an x87 register constraint to avoid the spill to > memory? i think fp_barrier does not have to drop excess precision: it is supposed to be an identity op that is hidden from the compiler e.g. to prevent const folding or hoisting, but fp_force_eval is used to force side-effects that may only happen if the excess precision is dropped. i think modern gcc drops excess precision at arg passing in standard mode, so "+m" is not needed, but makes the code behave the same in non-standard mode too. and yes the long double version could use "+t", maybe i should add that (the patch saves about 400byte .text because of volatile load/store overhead).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.