musl - Re: [PATCH] math: move i386 sqrt to C

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.2.20.13.2003212323150.2534@monopod.intra.ispras.ru>
Date: Sat, 21 Mar 2020 23:30:12 +0300 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] math: move i386 sqrt to C

On Sat, 21 Mar 2020, Rich Felker wrote:

> On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote:
> > On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> > > ---
> > > Since union ldshape does not have a dedicated field for 32 least significant
> > > bits of the x87 long double mantissa, keeping the original approach with
> > > 
> > >     ux.i.m -= (fpsr & 0x200) - 0x100;
> > > 
> > > would lead to a 64-bit subtraction that is not trivial for the compiler to
> > > optimize to 32-bit subtraction as done in the original assembly. Therefore
> > > I have elected to change the approach and use
> > > 
> > >     ux.i.m ^= (fpsr & 0x200) + 0x200;
> > > 
> > > which is easier to optimize to a 32-bit rather than 64-bit xor.
> > > 
> > > Thoughts?
> > 
> > I'm getting test failures with sqrt and this seems to be the culprit
> > -- I don't think it's equivalent. The original version could offset
> > the value by +0x100 or -0x100 before rounding, and offsets in the
> > opposite direction of the rounding that already occurred. Your version
> > can only offset it by +0x200 or -0x400.
> > 
> > The (well, one) particular failing case is:
> > 
> > src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512
> > got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2
> > 
> > Here the mantissa is
> > 
> > fffffffffffffc00
> > 
> > and offset by -0x400 yields:
> > 
> > fffffffffffff800
> > 
> > which has exactly 53 bits and therefore does not round up like it
> > should.
> > 
> > I still like your approach better if there's a way to salvage it. Do
> > you see one?
> 
> And, I think I do. Changing it to:
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x300;
> 
> yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks
> like it should work theoretically, and indeed it passes libc-test.

Indeed, I was considering only the default (to-nearest) rounding mode
and did not notice the problem for upwards rounding mode.

I think your change solves this nicely.

Thanks
Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.