musl - Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(), fminl() simplified

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191211213039.GP1666@brightrain.aerifal.cx>
Date: Wed, 11 Dec 2019 16:30:39 -0500
From: Rich Felker <dalias@...c.org>
To: Stefan Kanthak <stefan.kanthak@...go.de>
Cc: Szabolcs Nagy <nsz@...t70.net>, musl@...ts.openwall.com
Subject: Re: [PATCH] fmax(), fmaxf(), fmaxl(), fmin(), fminf(),
 fminl() simplified

On Wed, Dec 11, 2019 at 10:17:09PM +0100, Stefan Kanthak wrote:
> "Szabolcs Nagy" <nsz@...t70.net> wrote:
> 
> 
> >* Stefan Kanthak <stefan.kanthak@...go.de> [2019-12-11 13:33:44 +0100]:
> >> "Szabolcs Nagy" <nsz@...t70.net> wrote:
> >> >* Stefan Kanthak <stefan.kanthak@...go.de> [2019-12-11 10:55:29 +0100]:
> >> > these two are not equivalent for snan input, but we dont care
> >> > about snan, nor the compiler by default, so the compiler can
> >> > optimize one to the other (although musl uses explicit int
> >> > arithmetics instead of __builtin_isnan so it's a bit harder).
> >> 
> >> The latter behaviour was my reason to use (x != x) here: I attempt
> >> to replace as many function calls as possible with "normal" code,
> >> and also try to avoid transfers to/from FPU/SSE registers to/from
> >> integer registers if that does not result in faster/shorter code.
> > 
> > why not just change the definition of isnan then?
> 
> Because I did not want to introduce such a global change; until now my
> patches are just local (peephole) optimisations.
> 
> > #if __GNUC__ > xxx
> > #define isnan(x) sizeof(x)==sizeof(float) ? __builtin_isnanf(x) : ...
> 
> This is better than my proposed change, as it also avoids the side-
> effect of (x != x) which can raise exceptions, and gets rid of the
> explicit transfer to integer registers, which can hurt performance.
> 
> The macros isinf(), isnormal(), isfinite(), signbit() should of
> course be implemented in a similar way too, and the (internal only?)
> functions __FLOAT_BITS() and __DOUBLE_BITS() removed completely!

Not removed because the public headers support non-GNUC (or older GCC?
I forget when these were introduced) compilers that may not provide
these. Having the portable definitions present as the fallback case is
still desirable.

> PS: the following is just a "Gedankenspiel", extending the idea to
>     avoid transfers from/to SSE registers.
>     On x86-64, functions like isunordered(), copysign() etc. may be
>     implemented using SSE intrinsics _mm_*() as follows:
> 
> #include <immintrin.h>
> 
> int signbit(double argument)
> {
>     return /* 1 & */ _mm_movemask_pd(_mm_set_sd(argument));
> }

This is just a missed optimization the compiler should be able to do
without intrinsics, on any arch where floating point types are kept in
vector registers that can also do integer/bitmask operations.

> uint32_t lrint(double argument)
> {
>     return _mm_cvtsd_si32(_mm_set_sd(argument));
> }

This is already done (on x86_64 where it's valid). It's in an asm
source file but should be converted to a C source file with __asm__
and proper constraint, not intrinsics, because __asm__ is a compiler
feature we require support for and intrinsics aren't (and also they
have some really weird semantics with respect to how they interface
with C aliasing rules).

> double copysign(double magnitude, double sign)
> {
>     return _mm_cvtsd_f64(_mm_or_pd(_mm_and_pd(_mm_set_sd(-0.0), _mm_set_sd(sign)),
>                                    _mm_andnot_pd(_mm_set_sd(-0.0), _mm_set_sd(magnitude))));
> }

I don't think we have one like this for x86_64, but ideally the C
would compile to something like it. (See above about missed
optimization.)

>     Although the arguments and results are all held in SSE registers,
>     there's no way to use them directly; it's but necessary to
>     transfer them using _mm_set_sd() and _mm_cvtsd_f64(), which may
>     result in superfluous instructions emitted by the compiler.

I don't see why you say that. They should be used in-place if possible
just by virtue of how the compiler's IR works. Certainly for the
__asm__ form they will be used in-place.

>     If you but cheat and "hide" these functions from the compiler
>     by placing them in a library, you can implement them as follows:
> 
> __m128d fmin(__m128d x, __m128d y)
> {
>     __m128d mask = _mm_cmp_sd(x, x, _CMP_ORD_Q);
> 
>     return _mm_or_pd(_mm_and_pd(mask, _mm_min_sd(y, x)),
>                      _mm_andnot_pd(mask, y));
> }

Yes, this kind of thing (hacks with declaring functions with wrong
type to achieve an ABI result) is not something we really do in musl.
But it shouldn't be needed here.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.