|
Message-ID: <E2423D1F1F3848848AEA933048174858@H270>
Date: Fri, 13 Aug 2021 14:04:51 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Szabolcs Nagy" <nsz@...t70.net>
Cc: <musl@...ts.openwall.com>
Subject: Re: [PATCH #2] Properly simplified nextafter()
Szabolcs Nagy <nsz@...t70.net> wrote on 2021-08-10 at 23:34:
>* Stefan Kanthak <stefan.kanthak@...go.de> [2021-08-10 08:23:46 +0200]:
>> <https://git.musl-libc.org/cgit/musl/plain/src/math/nextafter.c>
>> has quite some superfluous statements:
>>
>> 1. there's absolutely no need for 2 uint64_t holding |x| and |y|;
>> 2. IEEE-754 specifies -0.0 == +0.0, so (x == y) is equivalent to
>> (ax == 0) && (ay == 0): the latter 2 tests can be removed;
>
> you replaced 4 int cmps with 4 float cmps (among other things).
>
> it's target dependent if float compares are fast or not.
It's also target dependent whether the FP additions and multiplies
used to raise overflow/underflow are SLOOOWWW: how can you justify
them, especially for targets using soft-float?
| /* raise overflow if ux.f is infinite and x is finite */
| if (e == 0x7ff)
| FORCE_EVAL(x+x);
| /* raise underflow if ux.f is subnormal or zero */
| if (e == 0)
| FORCE_EVAL(x*x + ux.f*ux.f);
> (the i386 machine where i originally tested this preferred int
> cmp and float cmp was very slow in the subnormal range
This also and still holds for i386 FPU fadd/fmul as well as SSE
addsd/addss/mulss/mulsd additions/multiplies!
Second version:
--- -/src/math/nextafter.c
+++ +/src/math/nextafter.c
@@ -10,13 +10,13 @@
return x + y;
if (ux.i == uy.i)
return y;
- ax = ux.i & -1ULL/2;
- ay = uy.i & -1ULL/2;
+ ax = ux.i << 2;
+ ay = uy.i << 2;
if (ax == 0) {
if (ay == 0)
return y;
ux.i = (uy.i & 1ULL<<63) | 1;
- } else if (ax > ay || ((ux.i ^ uy.i) & 1ULL<<63))
+ } else if ((ax < ay) == ((int64_t) ux.i < 0))
ux.i--;
else
ux.i++;
For AMD64, GCC generates the following ABSOLUTELY HORRIBLE CRAP
(the original code compiles even worse):
0000000000000000 <nextafter>:
0: 48 83 ec 38 sub $0x38,%rsp
4: 0f 29 74 24 20 movaps %xmm6,0x20(%rsp)
9: 49 b8 ff ff ff ff ff movabs $0x7fffffffffffffff,%r8
10: ff ff 7f
13: 49 b9 00 00 00 00 00 movabs $0x7ff0000000000000,%r9
1a: 00 f0 7f
1d: 66 49 0f 7e c2 movq %xmm0,%r10
22: 66 48 0f 7e c2 movq %xmm0,%rdx
27: 66 48 0f 7e c8 movq %xmm1,%rax
2c: 4d 21 c2 and %r8,%r10
2f: 66 48 0f 7e c1 movq %xmm0,%rcx
34: 4d 39 ca cmp %r9,%r10
37: 0f 87 83 00 00 00 ja bb <nextafter+0xbb>
3d: 49 21 c0 and %rax,%r8
40: 66 49 0f 7e ca movq %xmm1,%r10
45: 4d 39 c8 cmp %r9,%r8
48: 77 76 ja bb <nextafter+0xbb>
4a: 66 0f 28 f1 movapd %xmm1,%xmm6
4e: 48 39 d0 cmp %rdx,%rax
51: 74 7b je c9 <nextafter+0xc9>
53: 66 49 0f 7e c0 movq %xmm0,%r8
58: 48 8d 04 85 00 00 00 lea 0x0(,%rax,4),%rax
5f: 00
60: 49 c1 e0 02 shl $0x2,%r8
64: 74 7a je db <nextafter+0xd7>
66: 49 39 c0 cmp %rax,%r8
69: 66 49 0f 7e c0 movq %xmm0,%r8
6e: 48 8d 42 ff lea -0x1(%rdx),%rax
72: 41 0f 93 c1 setae %r9b
76: 49 c1 e8 3f shr $0x3f,%r8
7a: 48 83 c1 01 add $0x1,%rcx
7e: 45 38 c1 cmp %r8b,%r9b
81: 48 0f 44 c1 cmove %rcx,%rax
85: 48 89 c1 mov %rax,%rcx
88: 66 48 0f 6e f0 movq %rax,%xmm6
8d: 48 c1 e9 34 shr $0x34,%rcx
91: 81 e1 ff 07 00 00 and $0x7ff,%ecx
97: 81 f9 ff 07 00 00 cmp $0x7ff,%ecx
9d: 74 61 je ef <nextafter+0xef>
9f: 85 c9 test %ecx,%ecx
a1: 75 2b jne c9 <nextafter+0xc9>
a3: 66 48 0f 6e c2 movq %rdx,%xmm0
a8: 66 48 0f 6e c8 movq %rax,%xmm1
ad: f2 0f 59 ce mulsd %xmm6,%xmm1
b1: f2 0f 59 c0 mulsd %xmm0,%xmm0
b5: f2 0f 58 c1 addsd %xmm1,%xmm0
b9: eb 0e jmp c9 <nextafter+0xc9>
bb: 66 48 0f 6e f2 movq %rdx,%xmm6
c0: 66 48 0f 6e d0 movq %rax,%xmm2
c5: f2 0f 58 f2 addsd %xmm2,%xmm6
c9: 66 0f 28 c6 movapd %xmm6,%xmm0
cd: 0f 28 74 24 20 movaps 0x20(%rsp),%xmm6
d2: 48 83 c4 38 add $0x38,%rsp
d6: c3 retq
d7: 48 85 c0 test %rax,%rax
da: 74 e9 je c9 <nextafter+0xc9>
dc: 48 b8 00 00 00 00 00 movabs $0x8000000000000000,%rax
e3: 00 00 80
e6: 4c 21 d0 and %r10,%rax
ea: 48 83 c8 01 or $0x1,%rax
ed: eb 8d jmp 85 <nextafter+0x85>
ef: 66 48 0f 6e c2 movq %rdx,%xmm0
f4: f2 0f 58 c0 addsd %xmm0,%xmm0
f8: eb be jmp c9 <nextafter+0xc9>
How do you compare these 60 instructions/252 bytes to the code I posted
(23 instructions/72 bytes)?
not amused about such HORRIBLE machine code!
Stefan
Download attachment "nextafter.patch" of type "application/octet-stream" (416 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.