Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A626BC84BF3C4C3B88E7F696383397A7@H270>
Date: Wed, 11 Aug 2021 18:50:28 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Rich Felker" <dalias@...c.org>
Cc: "Szabolcs Nagy" <nsz@...t70.net>,
	<musl@...ts.openwall.com>
Subject: Re: [PATCH] Properly simplified nextafter()

Rich Felker <dalias@...c.org> wrote:

[...]

> static __inline unsigned __FLOAT_BITS(float __f)
> {
> union {float __f; unsigned __i;} __u;
> __u.__f = __f;
> return __u.__i;
> }
>
> #define isnan(x) ( \
> sizeof(x) == sizeof(float) ? (__FLOAT_BITS(x) & 0x7fffffff) > 0x7f800000 : \
> sizeof(x) == sizeof(double) ? (__DOUBLE_BITS(x) & -1ULL>>1) > 0x7ffULL<<52 : \
> __fpclassifyl(x) == FP_NAN)
>
> So, nope.

GCC typically uses its __builtin_isnan() for isnan(), which doesn't
use integer instructions or reloads:

$ cat isnan.c
int foo(double x) {
    return isnan(x);
}
int bar(double x) {
    return __builtin_isnan(x);
}
$ gcc -S -O3 -o- isnan.c
...
        xorl    %eax, %eax
        ucomisd %xmm0, %xmm0
        setp    %al
        ret
...

> Unless it's doing some extremely high level rewriting of
> this inspection of the representation.

It performs the high-level substitution of isnan with __builtin_isnan

[...]

>> GCC generates here at least 12 instructions more, also longer ones,
>> including 2 movabs to load 0x8000000000000000 and 0x7FFFFFFFFFFFFFFF,
>> so the code is more than 50% fatter, mixes integer SSE and FP SSE
>> instructions which incur 2 cycles penalty on many Intel CPUs, with
>> WAY TOO MANY not so predictable (un)conditional branches.
>
> We don't use asm to optimize out 2 cycles.

This is just ONE of the many deficiencies of the code GCC generates.

> If the compiler is choosing a bad way to perform these loads the compiler
> should be fixed. But I don't think it matters in any measurable way in real usage.

On several families of Intel Core-i processors this 1 cycle penalty occurs
EVERY time an SSE register is accessed by a FP instruction AFTER an integer
instruction and vice versa!

BAD:
        pxor     xmm1, xmm1
        cmpsd    xmm0, xmm1

good:
        xorpd    xmm1, xmm1
        cmpsd    xmm0, xmm1

Stefan

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.