musl - Re: [RFC 00/14] aarch64: Convert to inline asm

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97b55a5d-9a47-481a-bdf0-1a93221bdb35@gmail.com>
Date: Sun, 14 Dec 2025 14:11:49 -0500
From: Demi Marie Obenour <demiobenour@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com, Bill Roberts <bill.roberts@....com>
Subject: Re: [RFC 00/14] aarch64: Convert to inline asm

On 12/14/25 10:18, Rich Felker wrote:
> On Sat, Dec 13, 2025 at 09:22:43PM -0500, Demi Marie Obenour wrote:
>> On 12/8/25 14:10, Rich Felker wrote:
>>> On Mon, Dec 08, 2025 at 11:44:43AM -0600, Bill Roberts wrote:
>>>> Based on previous discussions on enabling PAC and BTI for Aarch64
>>>> targets, rather than annotating the existing assembler, use inline
>>>> assembly and mix of C. Now this has the benefits of:
>>>> 1. Handling PAC, BTI and GCS.
>>>>    a. prologue and eplilog insertion as needed.
>>>>    b. Adding GNU notes as needed.
>>>> 2. Adding in the CFI statements as needed.
>>>>
>>>> I'd love to get feedback, thanks!
>>>>
>>>> Bill Roberts (14):
>>>>   aarch64: drop crt(i|n).s since NO_LEGACY_INITFINI
>>>>   aarch64: rewrite fenv routines in C using inline asm
>>>>   aarch64: rewrite vfork routine in C using inline asm
>>>>   aarch64: rewrite clone routine in C using inline asm
>>>>   aarch64: rewrite __syscall_cp_asm in C using inline asm
>>>>   aarch64: rewrite __unmapself in C using inline asm
>>>>   aarch64: rewrite tlsdesc reoutines in C using inline asm
>>>>   aarch64: rewrite __restore_rt routines in C using inline asm
>>>>   aarch64: rewrite longjmp routines in C using inline asm
>>>>   aarch64: rewrite setjmp routines in C using inline asm
>>>>   aarch64: rewrite sigsetjmp routines in C using inline asm
>>>>   aarch64: rewrite dlsym routine in C using inline asm
>>>>   aarch64: rewrite memcpy routine in C using inline asm
>>>>   aarch64: rewrite memset routine in C using inline asm
>>>
>>> Of these, at least vfork, tlsdesc, __restore_rt, setjmp, sigsetjmp,
>>> and dlsym are fundamentally wrong in that they have to be asm entry
>>> points. Wrapping them in C breaks the state they need to receive.
>>>
>>> Some others like __syscall_cp_asm are wrong by virtue of putting
>>> symbol definitions inside inline asm, which may be emitted a different
>>> number of times than it appears in the source. The labels in
>>> __syscall_cp_asm must exist only once, so it really needs to be
>>> external asm (for a slightly different reason than the entry point
>>> needing to be asm).
>>>
>>> The advice to move to inline asm was to do it where possible, i.e.
>>> where it's gratuitous that we had an asm source file. But even where
>>> this can be done, it should be done by actually writing the inline asm
>>> with proper register constraints, not just copy-pasting the asm into C
>>> files wrapped in __asm__. Some things, like __clone, even if they
>>> could be done as C source files with asm, are not valid the way you've
>>> just wrapped them because you're performing a return from within the
>>> asm but don't have access to the return address or any way to undo
>>> potential stack adjustments made in prologue before the __asm__. And
>>> this would catastrophically break if LTO'd.
>>>
>>> memcpy and memset are slated for "removal" at some point, replacing
>>> the high level flow logic in arch-specific asm with shared high level
>>> C and arch-provided asm only for the middle-section bulk copy/fill
>>> operation in aligned and unaligned variants. I'm really not up for
>>> reviewing and trusting in the correctness of large changes to any of
>>> the existing arch-specific memcpy/memset asm or adding new ones for
>>> other archs until then, because it's effort on something that's
>>> intended to be removed. So these should just be kept as-is for now.
>>
>> There is code in the wild that relies on memcpy not actually causing
>> data races, even though the C standard says otherwise.  The problem is
>> that the standard provides literally no option for accessing memory
>> that may be concurrently modified by untrusted code, even doing so
>> in assembly is perfectly okay.
>>
>> To avoid data races, this code would need to be rewritten to use
>> assembly code for various architectures.  I doubt this is a feasible
>> solution.
>>
>> The proper fix is for the standard to include bytewise-atomic memcpy.
>> Until then, people will use hacks like this.
>> As a quality of implementation matter, I strongly recommend that all
>> accesses by memcpy() to user buffers happen in assembly code.
> 
> musl does not go out of its way to facilitate gross UB by
> applications. If anything (if it's detectable and detected at low
> cost, or if it's so high-risk that some cost is acceptable) we trap
> and immediately crash when it's detected. We don't just make it
> silently "do what the programmer wanted".
> 
> If a program is operating on memory that may change asynchronously out
> from under it, it needs to be using the appropriate volatile or atomic
> qualifications. memcpy very intentionally does not take a volatile
> void * because it's not valid to pass pointers to such memory to
> memcpy.
> 
> Rich

People do this because they have no other decent choices.

Doing anything else in C winds winds up with a 40x slowdown because the
compiler cannot optimize anything.  Using hand-written assembly is a
maintenance nightmare.  volatile is for MMIO, and atomics are for where
synchronization is needed.  Both severely over-constrain the compiler.

Probably the best currently available solution is to come up with an
assembly code library that does the job.  But that's going to be a
lot of work, and it won't fix the (likely *many*) programs with this
bug in the wild.  Some of which are security-critical.

Is the risk of breaking existing code worth theoretical purity here?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Download attachment "OpenPGP_0xB288B55FFF9C22C1.asc" of type "application/pgp-keys" (7141 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (834 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.