Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 15 Mar 2024 18:53:59 +0100
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: x86 fma with run-time switch?

Hi all,

in commit e9016138, Szabolcs wrote into the message that we really
should be using the single-instruction versions if possible, and we
should be switching at run time. I have an idea for how to do that
without losing all of the history of the generic fma.c:

- Rename src/math/fma.c to src/math/fma-soft.h. Rename the fma function
  inside to fma_soft and make it static (inline?).
- Create a new src/math/fma.c that includes fma-soft.h and just calls
  fma_soft().
- In src/math/x86_64/fma.c: Unconditionally define fma_fma() and
  fma_fma4() (which are the current assembler versions) and include
  fma-soft.h. Create a dispatcher to figure out which version to call,
  and call that from fma().

Yeah, I know, the header file with stuff in it that takes memory is not
exactly great, but I can't think of another way to define the generic
version such that it is accessible to the arch-specific versions under a
different name and linkage. The file must not be a .c file, or else it
will confuse the build system.

Question I have right out the gate is whether this would be interesting
to the group. Second question is whether it is better to be running
cpuid every time fma() is called, or to use a function pointer? I am
partial to the dispatcher pattern myself. In that case, the function
pointer is initialized at load time to point to the dispatcher, which
then selects the best implementation and updates the function pointer.
The main function only unconditionally calls the function pointer.

With a bit of preprocessor magic, I can also ensure that if __FMA__ or
__FMA4__ are set, the dispatcher is not included, and only the given
function is called. Although that may incur a warning of an unused
static function. I suppose that is a problem that can be fixed with more
preprocessor magic.

>From my preliminary research, the fma3 and fma4 ISA extensions require
no kernel support, so this will be the first time a CPUID call is
needed. fma3 support is signalled with bit 12 of ECX in CPUID function
1. fma4 support is signalled with bit 16 of ECX in CPUID function
0x80000001 - on AMD CPUs. Intel has the bit reserved, so to be extra
safe, the CPU vendor ought to be checked, too.

Doing the same for i386 requires also verifying kernel SSE support in
hwcap (that implies CPUID support in the CPU, since the baseline 80486
does not necessarily have that, but all CPUs with SSE have it) and also
support for extended CPUID in case of fma4.

Since the CPUID challenges would be shared between fma and fmaf, I would
like to put them into a new header file in src/include (maybe create
src/include/x86_64? Or should it be added to arch/x86_64?)

So what are your thoughts on this?

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.