musl - Re: [PATCH] configure: prevent compilers from turning a * b + c into fma(a, b, c)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAH9TF6M7Xey=COOVGDvQovTyyqum_8k783RJjVUXqUKYsSxa=Q@mail.gmail.com>
Date: Thu, 29 Aug 2024 15:36:51 +0200
From: Alex Rønne Petersen <alex@...xrp.com>
To: Alexander Monakov <amonakov@...ras.ru>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] configure: prevent compilers from turning a * b +
 c into fma(a, b, c)

On Wed, Aug 28, 2024 at 9:56 PM Alexander Monakov <amonakov@...ras.ru> wrote:
>
>
> On Wed, 28 Aug 2024, Alex Rønne Petersen wrote:
>
> > I've seen Clang do this for expressions in the fma() implementation itself,
> > which of course led to infinite recursion. This happened when targeting
> > arm-linux-musleabi with full soft float mode and -march=armv8-a. I imagine
>
> FWIW I can't seem to reproduce this issue. For optionally-fused multiply-add
> LLVM IR uses @llvm.fmuladd.f64, which under -mfloat-abi=soft is expanded via
> __aeabi_dmul + __aeabi_dadd. I'm quite unsure how you got LLVM to generate a
> call to fma in your circumstances.

Ok, I had to do some digging to figure out what was going on here. The
TL;DR is that the issue is *mostly* specific to Zig due to the way we
model CPU features and pass them to Clang, and because of what's
likely an Arm backend bug. You *can* technically reproduce it with
vanilla Clang too, but you have to go far enough out of your way that
I don't think it happens in practice.

In `zig cc`, we pass the full set of all possible CPU features to
Clang via `-Xclang -target-feature -Xclang +/-<name>` - basically
bypassing the frontend driver. This means that when we target the
default `armv8-a` CPU, a bunch of floating point features are enabled
which the Clang driver normally explicitly disables when it sees
`-mfloat-abi=soft`. When we get to the Arm backend,
`ARMTargetLowering::isFMAFasterThanFMulAndFAdd()` does *not* check the
`use-soft-float` function attribute when deciding whether lowering to
a real FMA instruction is worthwhile, so
`SelectionDAGBuilder::visitIntrinsicCall()` decides to emit an
`ISD::FMA` node. Later, due `use-soft-float` being set,
`DAGTypeLegalizer::SoftenFloatRes_FMA()` converts the `ISD::FMA` to a
libcall.

Like was done for PowerPC, Arm's `isFMAFasterThanFMulAndFAdd()` should
probably just be changed to check for soft float.

That aside, while the motivating issue doesn't (easily) reproduce with
vanilla Clang, it's nonetheless still the case that Clang folds
multiple expressions in `fma()` into `llvm.fmuladd.*` intrinsic calls.
While this might work out in some cases, we've still basically lost at
the LLVM IR level; we're at the mercy of the target backend in regards
to whether it gets lowered to an actual FMA instruction or split back
to the ~original FMUL + FADD. And this isn't even considering what
other nonsense the optimizer pipeline might get up to before that.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.