musl - Re: [PATCH] Properly simplified nextafter()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A626BC84BF3C4C3B88E7F696383397A7@H270>
Date: Wed, 11 Aug 2021 18:50:28 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Rich Felker" <dalias@...c.org>
Cc: "Szabolcs Nagy" <nsz@...t70.net>,
	<musl@...ts.openwall.com>
Subject: Re: [PATCH] Properly simplified nextafter()

Rich Felker <dalias@...c.org> wrote:

[...]

> static __inline unsigned __FLOAT_BITS(float __f)
> {
> union {float __f; unsigned __i;} __u;
> __u.__f = __f;
> return __u.__i;
> }
>
> #define isnan(x) ( \
> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) & 0x7fffffff) > 0x7f800000 : \
> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) & -1ULL>>1) > 0x7ffULL<<52 : \
> __fpclassifyl(x) == FP_NAN)
>
> So, nope.

GCC typically uses its __builtin_isnan() for isnan(), which doesn't
use integer instructions or reloads:

$ cat isnan.c
int foo(double x) {
    return isnan(x);
}
int bar(double x) {
    return __builtin_isnan(x);
}
$ gcc -S -O3 -o- isnan.c
...
        xorl    %eax, %eax
        ucomisd %xmm0, %xmm0
        setp    %al
        ret
...

> Unless it's doing some extremely high level rewriting of
> this inspection of the representation.

It performs the high-level substitution of isnan with __builtin_isnan

[...]

>> GCC generates here at least 12 instructions more, also longer ones,
>> including 2 movabs to load 0x8000000000000000 and 0x7FFFFFFFFFFFFFFF,
>> so the code is more than 50% fatter, mixes integer SSE and FP SSE
>> instructions which incur 2 cycles penalty on many Intel CPUs, with
>> WAY TOO MANY not so predictable (un)conditional branches.
>
> We don't use asm to optimize out 2 cycles.

This is just ONE of the many deficiencies of the code GCC generates.

> If the compiler is choosing a bad way to perform these loads the compiler
> should be fixed. But I don't think it matters in any measurable way in real usage.

On several families of Intel Core-i processors this 1 cycle penalty occurs
EVERY time an SSE register is accessed by a FP instruction AFTER an integer
instruction and vice versa!

BAD:
        pxor     xmm1, xmm1
        cmpsd    xmm0, xmm1

good:
        xorpd    xmm1, xmm1
        cmpsd    xmm0, xmm1

Stefan

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.