|
Message-ID: <35E4CF2B08294E46B0451043F46D2C7C@H270>
Date: Fri, 6 Aug 2021 19:23:19 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Rich Felker" <dalias@...c.org>
Cc: "Alexander Monakov" <amonakov@...ras.ru>,
"Szabolcs Nagy" <nsz@...t70.net>,
<musl@...ts.openwall.com>
Subject: Re: [Patch] src/math/i386/remquo.s: remove conditional branch, shorter bit twiddling
Rich Felker <dalias@...c.org> wrote:
If you don't want patches for assembly modules, please state so VERY
CLEAR in your FAQ.
> On Fri, Aug 06, 2021 at 12:17:12PM +0200, Stefan Kanthak wrote:
>> Alexander Monakov <amonakov@...ras.ru> wrote:
>>
>> > On Wed, 4 Aug 2021, Stefan Kanthak wrote:
>> >> The change just follows by removing 6 LOC/instructions.-)
>> >
>> > Have you considered collecting the three bits in one go via a multiplication?
>>
>> No. My mind is not that twisted;-)
>>
>> > You can first isolate the necessary bits with 'and $0x4300, %eax', then do
>> > 'imul $0x910000, %eax, %eax' to put the required bits in EAX[31:29] in the
>> > right order, then shift right by 29. Three instructions, 14 bytes.
>>
>> Thanks, VERY NICE! How did you come up to it?
>>
>> Revised patch with shorter bit twiddling attached.
>
> The path forward for all the math asm is moving it to inline asm in C
> files, with no flow control or bit/register shuffling in the asm, only
> using asm for the single instructions. See how Alexander Monakov did
> x86_64 remquol in commit 19f870c3a68a959c7c6ef1de12086ac908920e5e.
This commit is for i386 fmod/fmodf/fmodl. The bit twiddling used in
<https://git.musl-libc.org/cgit/musl/plain/src/math/x86_64/remquol.c>
(which I hadn't noticed yet) and the code GCC generates for it is but
(almost) as bad as the original assembly code:
| shrl $8, %eax
| movl %eax, %ecx
| movabsq $8463725162920157216, %rax
| rolb $4, %cl
| andl $60, %ecx
| sarq %cl, %rax
| andl $7, %eax
vs.
| mov %ah,%dl
| shr %dl
| and $1,%dl
| mov %ah,%al
| shr $5,%al
| and $2,%al
| or %al,%dl
| mov %ah,%al
| shl $2,%al
| and $4,%al
| or %al,%dl
> I haven't read the mul trick here in detail but I believe it should be
> duplicable with plain C * operator.
It is.
> I really do not want to review/merge asm changes that keep this kind
> of complex logic in asm when there's no strong motivation for it (like
> fixing an actual bug, vs just reducing size or improving speed). The
> risk to reward ratio is just not reasonable.
Final patch attached!
Stefan
Download attachment "remquo.patch" of type "application/octet-stream" (3126 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.