|
Message-ID: <19C2F5D4C9574B8D9ADFCBE84CA0BC97@H270> Date: Wed, 11 Dec 2019 23:25:37 +0100 From: "Stefan Kanthak" <stefan.kanthak@...go.de> To: "Rich Felker" <dalias@...c.org> Cc: "Szabolcs Nagy" <nsz@...t70.net>, <musl@...ts.openwall.com> Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified "Rich Felker" <dalias@...c.org> wrote: > On Wed, Dec 11, 2019 at 10:17:09PM +0100, Stefan Kanthak wrote: [...] >> PS: the following is just a "Gedankenspiel", extending the idea to >> avoid transfers from/to SSE registers. >> On x86-64, functions like isunordered(), copysign() etc. may be >> implemented using SSE intrinsics _mm_*() as follows: >> >> #include <immintrin.h> >> >> int signbit(double argument) >> { >> return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument)); >> } > > This is just a missed optimization the compiler should be able to do > without intrinsics, on any arch where floating point types are kept in > vector registers that can also do integer/bitmask operations. The catch here is but that the MOVMSKPD instruction generated from _mm_movemask_pd() intrinsic yields its result in an integer register, so there's no need to do integer/bitmask operations on vector registers (and transfer them to an integer register afterwards). >> uint32_t lrint(double argument) >> { >> return _mm_cvtsd_si32(_mm_set_sd(argument)); >> } > > This is already done (on x86_64 where it's valid). It's in an asm > source file This is exactly the cheating I address below: the prototype of the assembler function matches the ABI, but not the C declaration. > but should be converted to a C source file with __asm__ > and proper constraint, not intrinsics, because __asm__ is a compiler > feature we require support for and intrinsics aren't (and also they > have some really weird semantics with respect to how they interface > with C aliasing rules). That's why I introduced this only as a "Gedankenspiel"! >> double copysign(double magnitude, double sign) >> { >> return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)), >> _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude)))); >> } > > I don't think we have one like this for x86_64, but ideally the C > would compile to something like it. (See above about missed > optimization.) Compilers typically emit superfluous PXOR/XORPD instructions here to clear the upper lane(s) of the vector registers, although _mm_*_sd() and _mm_*_ss() don't touch the upper lanes (so invalid values can't raise exceptions), and the bitmask operations _mm_*_pd() don't raise exceptions on SNANs, subnormals etc. >> Although the arguments and results are all held in SSE registers, >> there's no way to use them directly; it's but necessary to >> transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may >> result in superfluous instructions emitted by the compiler. > > I don't see why you say that. Just insert "in plain C" after "there's no way to use them directly" > They should be used in-place if possible just by virtue of how the > compiler's IR works. See above: most often XORPD or another instruction to clear/set the upper lane(s) is emitted. > Certainly for the __asm__ form they will be used in-place. Right. But that's the inline form of cheating.-) >> If you but cheat and "hide" these functions from the compiler >> by placing them in a library, you can implement them as follows: >> >> __m128d fmin(__m128d x, __m128d y) >> { >> __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q); >> >> return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)), >> _mm_andnot_pd(mask, y)); >> } > > Yes, this kind of thing (hacks with declaring functions with wrong > type to achieve an ABI result) is not something we really do in musl. > But it shouldn't be needed here. Remember that this is just a "Gedankenspiel". Stefan
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.