|
|
Message-ID: <97b55a5d-9a47-481a-bdf0-1a93221bdb35@gmail.com> Date: Sun, 14 Dec 2025 14:11:49 -0500 From: Demi Marie Obenour <demiobenour@...il.com> To: Rich Felker <dalias@...c.org> Cc: musl@...ts.openwall.com, Bill Roberts <bill.roberts@....com> Subject: Re: [RFC 00/14] aarch64: Convert to inline asm On 12/14/25 10:18, Rich Felker wrote: > On Sat, Dec 13, 2025 at 09:22:43PM -0500, Demi Marie Obenour wrote: >> On 12/8/25 14:10, Rich Felker wrote: >>> On Mon, Dec 08, 2025 at 11:44:43AM -0600, Bill Roberts wrote: >>>> Based on previous discussions on enabling PAC and BTI for Aarch64 >>>> targets, rather than annotating the existing assembler, use inline >>>> assembly and mix of C. Now this has the benefits of: >>>> 1. Handling PAC, BTI and GCS. >>>> a. prologue and eplilog insertion as needed. >>>> b. Adding GNU notes as needed. >>>> 2. Adding in the CFI statements as needed. >>>> >>>> I'd love to get feedback, thanks! >>>> >>>> Bill Roberts (14): >>>> aarch64: drop crt(i|n).s since NO_LEGACY_INITFINI >>>> aarch64: rewrite fenv routines in C using inline asm >>>> aarch64: rewrite vfork routine in C using inline asm >>>> aarch64: rewrite clone routine in C using inline asm >>>> aarch64: rewrite __syscall_cp_asm in C using inline asm >>>> aarch64: rewrite __unmapself in C using inline asm >>>> aarch64: rewrite tlsdesc reoutines in C using inline asm >>>> aarch64: rewrite __restore_rt routines in C using inline asm >>>> aarch64: rewrite longjmp routines in C using inline asm >>>> aarch64: rewrite setjmp routines in C using inline asm >>>> aarch64: rewrite sigsetjmp routines in C using inline asm >>>> aarch64: rewrite dlsym routine in C using inline asm >>>> aarch64: rewrite memcpy routine in C using inline asm >>>> aarch64: rewrite memset routine in C using inline asm >>> >>> Of these, at least vfork, tlsdesc, __restore_rt, setjmp, sigsetjmp, >>> and dlsym are fundamentally wrong in that they have to be asm entry >>> points. Wrapping them in C breaks the state they need to receive. >>> >>> Some others like __syscall_cp_asm are wrong by virtue of putting >>> symbol definitions inside inline asm, which may be emitted a different >>> number of times than it appears in the source. The labels in >>> __syscall_cp_asm must exist only once, so it really needs to be >>> external asm (for a slightly different reason than the entry point >>> needing to be asm). >>> >>> The advice to move to inline asm was to do it where possible, i.e. >>> where it's gratuitous that we had an asm source file. But even where >>> this can be done, it should be done by actually writing the inline asm >>> with proper register constraints, not just copy-pasting the asm into C >>> files wrapped in __asm__. Some things, like __clone, even if they >>> could be done as C source files with asm, are not valid the way you've >>> just wrapped them because you're performing a return from within the >>> asm but don't have access to the return address or any way to undo >>> potential stack adjustments made in prologue before the __asm__. And >>> this would catastrophically break if LTO'd. >>> >>> memcpy and memset are slated for "removal" at some point, replacing >>> the high level flow logic in arch-specific asm with shared high level >>> C and arch-provided asm only for the middle-section bulk copy/fill >>> operation in aligned and unaligned variants. I'm really not up for >>> reviewing and trusting in the correctness of large changes to any of >>> the existing arch-specific memcpy/memset asm or adding new ones for >>> other archs until then, because it's effort on something that's >>> intended to be removed. So these should just be kept as-is for now. >> >> There is code in the wild that relies on memcpy not actually causing >> data races, even though the C standard says otherwise. The problem is >> that the standard provides literally no option for accessing memory >> that may be concurrently modified by untrusted code, even doing so >> in assembly is perfectly okay. >> >> To avoid data races, this code would need to be rewritten to use >> assembly code for various architectures. I doubt this is a feasible >> solution. >> >> The proper fix is for the standard to include bytewise-atomic memcpy. >> Until then, people will use hacks like this. >> As a quality of implementation matter, I strongly recommend that all >> accesses by memcpy() to user buffers happen in assembly code. > > musl does not go out of its way to facilitate gross UB by > applications. If anything (if it's detectable and detected at low > cost, or if it's so high-risk that some cost is acceptable) we trap > and immediately crash when it's detected. We don't just make it > silently "do what the programmer wanted". > > If a program is operating on memory that may change asynchronously out > from under it, it needs to be using the appropriate volatile or atomic > qualifications. memcpy very intentionally does not take a volatile > void * because it's not valid to pass pointers to such memory to > memcpy. > > Rich People do this because they have no other decent choices. Doing anything else in C winds winds up with a 40x slowdown because the compiler cannot optimize anything. Using hand-written assembly is a maintenance nightmare. volatile is for MMIO, and atomics are for where synchronization is needed. Both severely over-constrain the compiler. Probably the best currently available solution is to come up with an assembly code library that does the job. But that's going to be a lot of work, and it won't fix the (likely *many*) programs with this bug in the wild. Some of which are security-critical. Is the risk of breaking existing code worth theoretical purity here? -- Sincerely, Demi Marie Obenour (she/her/hers) Download attachment "OpenPGP_0xB288B55FFF9C22C1.asc" of type "application/pgp-keys" (7141 bytes) Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.