|
Message-ID: <CAH9TF6M7Xey=COOVGDvQovTyyqum_8k783RJjVUXqUKYsSxa=Q@mail.gmail.com> Date: Thu, 29 Aug 2024 15:36:51 +0200 From: Alex Rønne Petersen <alex@...xrp.com> To: Alexander Monakov <amonakov@...ras.ru> Cc: musl@...ts.openwall.com Subject: Re: [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c) On Wed, Aug 28, 2024 at 9:56 PM Alexander Monakov <amonakov@...ras.ru> wrote: > > > On Wed, 28 Aug 2024, Alex Rønne Petersen wrote: > > > I've seen Clang do this for expressions in the fma() implementation itself, > > which of course led to infinite recursion. This happened when targeting > > arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine > > FWIW I can't seem to reproduce this issue. For optionally-fused multiply-add > LLVM IR uses @llvm.fmuladd.f64, which under -mfloat-abi=soft is expanded via > __aeabi_dmul + __aeabi_dadd. I'm quite unsure how you got LLVM to generate a > call to fma in your circumstances. Ok, I had to do some digging to figure out what was going on here. The TL;DR is that the issue is *mostly* specific to Zig due to the way we model CPU features and pass them to Clang, and because of what's likely an Arm backend bug. You *can* technically reproduce it with vanilla Clang too, but you have to go far enough out of your way that I don't think it happens in practice. In `zig cc`, we pass the full set of all possible CPU features to Clang via `-Xclang -target-feature -Xclang +/-<name>` - basically bypassing the frontend driver. This means that when we target the default `armv8-a` CPU, a bunch of floating point features are enabled which the Clang driver normally explicitly disables when it sees `-mfloat-abi=soft`. When we get to the Arm backend, `ARMTargetLowering::isFMAFasterThanFMulAndFAdd()` does *not* check the `use-soft-float` function attribute when deciding whether lowering to a real FMA instruction is worthwhile, so `SelectionDAGBuilder::visitIntrinsicCall()` decides to emit an `ISD::FMA` node. Later, due `use-soft-float` being set, `DAGTypeLegalizer::SoftenFloatRes_FMA()` converts the `ISD::FMA` to a libcall. Like was done for PowerPC, Arm's `isFMAFasterThanFMulAndFAdd()` should probably just be changed to check for soft float. That aside, while the motivating issue doesn't (easily) reproduce with vanilla Clang, it's nonetheless still the case that Clang folds multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls. While this might work out in some cases, we've still basically lost at the LLVM IR level; we're at the mercy of the target backend in regards to whether it gets lowered to an actual FMA instruction or split back to the ~original FMUL + FADD. And this isn't even considering what other nonsense the optimizer pipeline might get up to before that.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.