Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C60D05C95724A36B3DB9942D06CFE5F@H270>
Date: Wed, 11 Aug 2021 00:53:37 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Szabolcs Nagy" <nsz@...t70.net>
Cc: <musl@...ts.openwall.com>
Subject: Re: [PATCH] Properly simplified nextafter()

Szabolcs Nagy <nsz@...t70.net> wrote:

>* Stefan Kanthak <stefan.kanthak@...go.de> [2021-08-10 08:23:46 +0200]:
>> <https://git.musl-libc.org/cgit/musl/plain/src/math/nextafter.c>
>> has quite some superfluous statements:
>> 
>> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|;
>> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to
>>    (ax == 0) && (ay == 0): the latter 2 tests can be removed;
> 
> you replaced 4 int cmps with 4 float cmps (among other things).

and hinted that the result of the second pair of comparisions is
already known from the first pair.

> it's target dependent if float compares are fast or not.

It's also target dependent whether the floating-point registers
can be accessed by integer instructions, or need to be copied:
some win, some loose!
Just let the compiler/optimizer do its job!

> (the i386 machine where i originally tested this preferred int
> cmp and float cmp was very slow in the subnormal range and
> iirc it also raises the non-standard input denormal exception,
> which is fine i guess.

This exception resp. the (sticky) flag is explicitly raised/set
in the part following the patch.

> of course soft float abis much prefer int cmp so your code is
> likely much slower and bigger there).

0. Doesn't musl provide target specific routines for targets with
   soft FP?

1. If not: the compiler knows the target ABI and SHOULD generate
   the proper integer comparisions there.
 
> but i'm not against the change, it is likely better on modern
> machines. did you try to benchmark it? or check the code size?

I STILL don't run a system supported by musl.
The code is of course smaller ... but not as small and fast as a
proper i386 or AMD64 assembly implementation ... which I can
post upon request.

regards
Stefan

>> 3. there's absolutely no need to compare the signs of x and y
>>    with the sign of the direction: its sufficient to test that
>>    direction and sign of x match;
>> 4. a proper compiler/optimizer should be able to reuse the results
>>    of the comparision (x == y) for (x < y) or (x > y) and
>>    (x == 0.0) for (x < 0.0) or (x > 0.0).
>> 
>>    JFTR: if ((x < 0.0) == (x < y)) is equivalent to
>>          if ((x > 0.0) == (x > y))
>> 
>> --- -/src/math/nextafter.c
>> +++ +/src/math/nextafter.c
>> @@ -3,20 +3,15 @@
>>  double nextafter(double x, double y)
>>  {
>>         union {double f; uint64_t i;} ux={x}, uy={y};
>> -       uint64_t ax, ay;
>>         int e;
>> 
>>         if (isnan(x) || isnan(y))
>>                 return x + y;
>> -       if (ux.i == uy.i)
>> +       if (x == y)
>>                 return y;
>> -       ax = ux.i & -1ULL/2;
>> -       ay = uy.i & -1ULL/2;
>> -       if (ax == 0) {
>> -               if (ay == 0)
>> -                       return y;
>> +       if (x == 0.0)
>>                 ux.i = (uy.i & 1ULL<<63) | 1;
>> -       } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
>> +       else if ((x < 0.0) == (x < y))
>>                 ux.i--;
>>         else
>>                 ux.i++;

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.