musl - Re: [Patch] src/math/i386/remquo.s: remove conditional branch, shorter bit twiddling

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210807005558.GX13220@brightrain.aerifal.cx>
Date: Fri, 6 Aug 2021 20:55:58 -0400
From: Rich Felker <dalias@...c.org>
To: Stefan Kanthak <stefan.kanthak@...go.de>
Cc: Alexander Monakov <amonakov@...ras.ru>, Szabolcs Nagy <nsz@...t70.net>,
	musl@...ts.openwall.com
Subject: Re: [Patch] src/math/i386/remquo.s: remove conditional
 branch, shorter bit twiddling

On Fri, Aug 06, 2021 at 07:23:19PM +0200, Stefan Kanthak wrote:
> Rich Felker <dalias@...c.org> wrote:
> 
> If you don't want patches for assembly modules, please state so VERY
> CLEAR in your FAQ.

I'm sorry we don't have better material published on this. A lot of
this knowledge lives collectively in mailing list and people's
memories and I can see how that's not friendly to new contributors. 

My reason for not wanting to take small changes to code like this is
partly a matter of budgeting my own time as maintainer, but more
importantly it's a matter of respecting the time of folks who rely on
musl in environments where trust and correctness matter, where each
change introduces a burden to assure oneself that it hasn't introduced
bugs. That comes with changing to the C-with-inline-asm too, but at
least the new form is more accessible to read, and carries with it
other potential advantages like ease of reviewing that there are not
x87 stack balancing bugs like the ones fixed in
f3ed8bfe8a82af1870ddc8696ed4cc1d5aa6b441 or excess precision bugs like
the ones fixed in 1c9afd69051a64cf085c6fb3674a444ff9a43857,
ab9e20905dc1cb74d955cc94b5b76e6e1892b8fe,
a662220df547e5c2446518e74440a7d834f9ebe6, and possibly others.

In any case, I should really see to it that the FAQ/wiki get updated
with information relevant saving new contributors to frustration of
finding out about things like this late.

> > On Fri, Aug 06, 2021 at 12:17:12PM +0200, Stefan Kanthak wrote:
> >> Alexander Monakov <amonakov@...ras.ru> wrote:
> >> 
> >> > On Wed, 4 Aug 2021, Stefan Kanthak wrote:
> >> >> The change just follows by removing 6 LOC/instructions.-)
> >> > 
> >> > Have you considered collecting the three bits in one go via a multiplication?
> >> 
> >> No. My mind is not that twisted;-)
> >> 
> >> > You can first isolate the necessary bits with 'and $0x4300, %eax', then do
> >> > 'imul $0x910000, %eax, %eax' to put the required bits in EAX[31:29] in the
> >> > right order, then shift right by 29. Three instructions, 14 bytes.
> >> 
> >> Thanks, VERY NICE! How did you come up to it?
> >> 
> >> Revised patch with shorter bit twiddling attached.
> > 
> > The path forward for all the math asm is moving it to inline asm in C
> > files, with no flow control or bit/register shuffling in the asm, only
> > using asm for the single instructions. See how Alexander Monakov did
> > x86_64 remquol in commit 19f870c3a68a959c7c6ef1de12086ac908920e5e.
> 
> This commit is for i386 fmod/fmodf/fmodl. The bit twiddling used in
> <https://git.musl-libc.org/cgit/musl/plain/src/math/x86_64/remquol.c>
> (which I hadn't noticed yet) and the code GCC generates for it is but
> (almost) as bad as the original assembly code:
> 
> |        shrl    $8, %eax
> |        movl    %eax, %ecx
> |        movabsq $8463725162920157216, %rax
> |        rolb    $4, %cl
> |        andl    $60, %ecx
> |        sarq    %cl, %rax
> |        andl    $7, %eax
> 
> vs.
> 
> |        mov %ah,%dl
> |        shr %dl
> |        and $1,%dl
> |        mov %ah,%al
> |        shr $5,%al
> |        and $2,%al
> |        or %al,%dl
> |        mov %ah,%al
> |        shl $2,%al
> |        and $4,%al
> |        or %al,%dl
> 
> > I haven't read the mul trick here in detail but I believe it should be
> > duplicable with plain C * operator.
> 
> It is.

Great! Does it improve the code generation?

> > I really do not want to review/merge asm changes that keep this kind
> > of complex logic in asm when there's no strong motivation for it (like
> > fixing an actual bug, vs just reducing size or improving speed). The
> > risk to reward ratio is just not reasonable.
> 
> Final patch attached!

Thanks. I don't mind making any final tweaks needed, if necessary.
Normally some care is required when treating outputs of asm as "float"
or "double" type, but I think in this case they're okay, simply
because the fprem1 instruction is exact and always produces a result
which fits in the same or fewer number of mantissa bits as the input
(so that 'dropping excess precision' is a no-op). I'll check that
before merging and follow up on this thread if I see any other
possible problems while reviewing.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.