|
Message-ID: <ZfSLN8KuN3AwhxV7@voyager> Date: Fri, 15 Mar 2024 18:53:59 +0100 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: x86 fma with run-time switch? Hi all, in commit e9016138, Szabolcs wrote into the message that we really should be using the single-instruction versions if possible, and we should be switching at run time. I have an idea for how to do that without losing all of the history of the generic fma.c: - Rename src/math/fma.c to src/math/fma-soft.h. Rename the fma function inside to fma_soft and make it static (inline?). - Create a new src/math/fma.c that includes fma-soft.h and just calls fma_soft(). - In src/math/x86_64/fma.c: Unconditionally define fma_fma() and fma_fma4() (which are the current assembler versions) and include fma-soft.h. Create a dispatcher to figure out which version to call, and call that from fma(). Yeah, I know, the header file with stuff in it that takes memory is not exactly great, but I can't think of another way to define the generic version such that it is accessible to the arch-specific versions under a different name and linkage. The file must not be a .c file, or else it will confuse the build system. Question I have right out the gate is whether this would be interesting to the group. Second question is whether it is better to be running cpuid every time fma() is called, or to use a function pointer? I am partial to the dispatcher pattern myself. In that case, the function pointer is initialized at load time to point to the dispatcher, which then selects the best implementation and updates the function pointer. The main function only unconditionally calls the function pointer. With a bit of preprocessor magic, I can also ensure that if __FMA__ or __FMA4__ are set, the dispatcher is not included, and only the given function is called. Although that may incur a warning of an unused static function. I suppose that is a problem that can be fixed with more preprocessor magic. >From my preliminary research, the fma3 and fma4 ISA extensions require no kernel support, so this will be the first time a CPUID call is needed. fma3 support is signalled with bit 12 of ECX in CPUID function 1. fma4 support is signalled with bit 16 of ECX in CPUID function 0x80000001 - on AMD CPUs. Intel has the bit reserved, so to be extra safe, the CPU vendor ought to be checked, too. Doing the same for i386 requires also verifying kernel SSE support in hwcap (that implies CPUID support in the CPU, since the baseline 80486 does not necessarily have that, but all CPUs with SSE have it) and also support for extended CPUID in case of fma4. Since the CPUID challenges would be shared between fma and fmaf, I would like to put them into a new header file in src/include (maybe create src/include/x86_64? Or should it be added to arch/x86_64?) So what are your thoughts on this? Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.