Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.20.13.2003212323150.2534@monopod.intra.ispras.ru>
Date: Sat, 21 Mar 2020 23:30:12 +0300 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] math: move i386 sqrt to C

On Sat, 21 Mar 2020, Rich Felker wrote:

> On Sat, Mar 21, 2020 at 01:53:51PM -0400, Rich Felker wrote:
> > On Tue, Jan 07, 2020 at 04:06:05PM +0300, Alexander Monakov wrote:
> > > ---
> > > Since union ldshape does not have a dedicated field for 32 least significant
> > > bits of the x87 long double mantissa, keeping the original approach with
> > > 
> > >     ux.i.m -= (fpsr & 0x200) - 0x100;
> > > 
> > > would lead to a 64-bit subtraction that is not trivial for the compiler to
> > > optimize to 32-bit subtraction as done in the original assembly. Therefore
> > > I have elected to change the approach and use
> > > 
> > >     ux.i.m ^= (fpsr & 0x200) + 0x200;
> > > 
> > > which is easier to optimize to a 32-bit rather than 64-bit xor.
> > > 
> > > Thoughts?
> > 
> > I'm getting test failures with sqrt and this seems to be the culprit
> > -- I don't think it's equivalent. The original version could offset
> > the value by +0x100 or -0x100 before rounding, and offsets in the
> > opposite direction of the rounding that already occurred. Your version
> > can only offset it by +0x200 or -0x400.
> > 
> > The (well, one) particular failing case is:
> > 
> > src/math/ucb/sqrt.h:49: RU sqrt(0x1.fffffffffffffp+1023) want 0x1p+512
> > got 0x1.fffffffffffffp+511 ulperr -0.250 = -0x1p-1 + 0x1p-2
> > 
> > Here the mantissa is
> > 
> > fffffffffffffc00
> > 
> > and offset by -0x400 yields:
> > 
> > fffffffffffff800
> > 
> > which has exactly 53 bits and therefore does not round up like it
> > should.
> > 
> > I still like your approach better if there's a way to salvage it. Do
> > you see one?
> 
> And, I think I do. Changing it to:
> 
>     ux.i.m ^= (fpsr & 0x200) + 0x300;
> 
> yields an offset of +0x300 (^0x300) or -0x300 (^0x500). This looks
> like it should work theoretically, and indeed it passes libc-test.

Indeed, I was considering only the default (to-nearest) rounding mode
and did not notice the problem for upwards rounding mode.

I think your change solves this nicely.

Thanks
Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.