musl - Re: aarch64 SME support issues

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250713021240.GE1827@brightrain.aerifal.cx>
Date: Sat, 12 Jul 2025 22:12:41 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: aarch64 SME support issues

On Thu, Jul 10, 2025 at 12:47:32AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@...c.org> [2025-07-09 15:02:35 -0400]:
> > On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > > On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > > > > Do you have a recommendation/preference beween masking it off or
> > > > > dropping the __getauxval exposure for now?
> > > > > 
> > > > > I think I'd rather mask it off, since in the (unusual but plausible)
> > > > > case where a static-only toolchain is built, I think the libgccc
> > > > > configure test will see the hidden __getauxval and be able to use it
> > > > > already.
> > > > > 
> > > > > And if we do masking, I think it makes sense to mask off all unknown
> > > > > bits so this doesn't happen again in the future with the next new
> > > > > thing, but I'm not sure. Does this sound reasonable? Are there any
> > > > > cases where *hiding* a hwcap bit could result in malfunction?
> > > > 
> > > > ok i hadnt considered the __getauxval change, i think that
> > > > is useful to go in: it will take time to safely update libgcc
> > > > so better to add it sooner and potentially more widely useful
> > > > than just for SME.
> > > > 
> > > > i think hiding a hwcap bit may lead to inconsistencies due
> > > > to kernel behaving differently than what libc pretends,
> > > > but i don't have a strong case, it likely can only affect
> > > > hacky code. so likely no abi break for normal code.
> > > 
> > > Yes that's what I'd expect.
> > > 
> > > > e.g. kernel enables BTI on vdso (or static exe) and user code
> > > > trying to indirect jump into the middle of a function after
> > > > checking via the libc hwcap that bti is off.
> > > > 
> > > > or creating MTE tagged objects via mprotect + instructions
> > > > based on cpuid and then passing them to a function that is
> > > > only MTE safe when HWCAP_MTE is set.
> > > 
> > > Note that we don't need to mask off any caps we already know the
> > > semantics for, only SME and possibly as-yet-unassigned ones we don't
> > > know will be safe without libc support.
> 
> these were meant to be examples of how masking
> a future unknown hwcap bit may go wrong based
> on existing hwcaps where libc hwcap vs kernel/isa
> difference may be visible.
> 
> > > 
> > > > or different part of atomics code trying to detect 128bit
> > > > lse atomics support differently (hwcap vs cpuid).
> > > > 
> > > > note that HWCAP2 is all used up, and now the top 32 bits
> > > > of HWCAP are getting allocated (used to be reserved when
> > > > we thought ilp32 was a thing, now only the top 2 bits are
> > > > kept for libc to use), musl does not have AT_HWCAP3 but
> > > > user code may query that anyway as AT_* values are abi.
> > > > not sure if you plan to deal with AT_HWCAP3 too.
> > > > 
> > > > i think masking HWCAP_SME* and top bits of AT_HWCAP
> > > > above 1<<41 should be fine for now. presumably this
> > > > can be undone if sme support is added.
> > > 
> > > Sounds good. Should we add and mask hwcap3 too?
> > 
> > Hmm, it looks like there are hwcap2 sme bits:
> > 
> > #define HWCAP2_SME		(1 << 23)
> > #define HWCAP2_SME_I16I64	(1 << 24)
> > #define HWCAP2_SME_F64F64	(1 << 25)
> > #define HWCAP2_SME_I8I32	(1 << 26)
> > #define HWCAP2_SME_F16F32	(1 << 27)
> > #define HWCAP2_SME_B16F32	(1 << 28)
> > #define HWCAP2_SME_F32F32	(1 << 29)
> > #define HWCAP2_SME_FA64		(1 << 30)
> > ...
> > #define HWCAP2_SME2		(1UL << 37)
> > #define HWCAP2_SME2P1		(1UL << 38)
> > #define HWCAP2_SME_I16I32	(1UL << 39)
> > #define HWCAP2_SME_BI32I32	(1UL << 40)
> > #define HWCAP2_SME_B16B16	(1UL << 41)
> > #define HWCAP2_SME_F16F16	(1UL << 42)
> > ...
> > #define HWCAP2_SME_LUTV2	(1UL << 57)
> > #define HWCAP2_SME_F8F16	(1UL << 58)
> > #define HWCAP2_SME_F8F32	(1UL << 59)
> > #define HWCAP2_SME_SF8FMA	(1UL << 60)
> > #define HWCAP2_SME_SF8DP4	(1UL << 61)
> > #define HWCAP2_SME_SF8DP2	(1UL << 62)
> > 
> > Not clear if any others are SME-related.
> > 
> > In plain hwcap I see:
> > 
> > #define HWCAP_SME2P2		(1UL << 42)
> > #define HWCAP_SME_SBITPERM	(1UL << 43)
> > #define HWCAP_SME_AES		(1UL << 44)
> > #define HWCAP_SME_SFEXPA	(1UL << 45)
> > #define HWCAP_SME_STMOP		(1UL << 46)
> > #define HWCAP_SME_SMOP4		(1UL << 47)
> > 
> > And no hwcap3 bits defined yet.
> > 
> > Should the above all be masked? Any I missed?
> 
> yeah i'd mask them all even if in principle
> HWCAP2_SME should be enough. i don't think
> any of the non-SME hwcaps imply HWCAP2_SME.
> 
> if we mask future bits then i think HWCAP3 should
> be masked too. there are no bits defined yet, so
> no existing kernel would pass it in auxv yet, but
> once it is passed musl should return 0 for it.
> 
> i just fear that if ppl figure out that musl is
> masking bits they will try to work it around by
> using whacky cpu feature detection. so ideally
> we don't keep masking forever (i can look into
> adding sme support, but not right now).

Proposed code attached.

Rich

View attachment "__set_thread_area.c" of type "text/plain" (716 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.