|
Message-ID: <19361e3c-8d13-af73-7896-bc4665e9788f@esi.com.au> Date: Tue, 25 Jun 2024 13:41:45 +1000 (AEST) From: Damian McGuckin <damianm@....com.au> To: MUSL <musl@...ts.openwall.com> Subject: Re: roundf() (and round(), and ...) On Sun, 23 Jun 2024, Rich Felker wrote: > I think this is not a good tradeoff. It seems unlikely anyone would be > using round in a performance-critical context with numbers that are in > a range where an inline compare would be far better, and the costs are > increased code size, decreased performance in useful cases, and > exacerbating timing dependency on data (which is generally a negative > thing). I was actually playing devil's advocate. I do not like the tradeoff either but I was looking for some words of wisdom (as per yours above) to justify it. Anyway... I am not convinced of the benefits of changing the existing rounding routines in MUSL to reflect any modification I might have made to the internal algorithms. And before any changes, I think there are other questions which need to be answered. So I might park it for now. Here is a summary of the work for those who are interested. They reduce the number of lines of GCC-11 generated assembler code by 33%. I am sure that others have thought of (and used) the same incremental changes long ago but never published them. Only the C 'float' routines are in a good shape. The project itself was actually done in another language. My modifications assumes that a call likes 'fabsf' is inlined to assembler which I hope we can assume is the rule these days. This is the code size in GCC11 assembler instructions for C 'float' routines: rintf.... 32/16 - metric was 5% faster truncf... 19/21 - metric was 10% faster roundf... 47/30 - no speed difference ceilf.... 38/29 - no speed difference floorf... 38/29 - no speed difference This line count does not include the startproc and endproc lines, but does include any line with just a label. The first number is that of the MUSL 2.5 routine, and the second is that of the revised routine. Overall, 174 lines down to 115 lines or a >33% reduction. The performance metric was the computation and comparison of GLibc routine + MUSL routine or Glibc routine + Revised routine The GLIBC routines is there for comparison. Where this showed the revised algorithm was faster by 5% or more, I have noted it. Note than no revision algorithm is slower. The CPU was a Xeon E5-2650v4. A 'roundeven' routine exists and it is 44 lines of assembler long. CLANG's line count is slightly longer, I can recode the revised roundf() to be >5% faster than MUSL's roundf() but then I run the risk that I am have a timing dependency on data, in this case, the set of all the 32-bit floating point numbers. - Damian
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.