|
Message-ID: <CAJgzZopv1Pap2MHZ7-+3M8+4XE_RVsR8_kRa1=2fEcpXHtSBaw@mail.gmail.com> Date: Wed, 14 Feb 2024 16:22:05 -0800 From: enh <enh@...gle.com> To: Rich Felker <dalias@...c.org>, William Roberts <bill.c.roberts@...il.com>, enh <enh@...gle.com>, musl@...ts.openwall.com Subject: Re: PAC/BTI Support on aarch64 On Wed, Feb 14, 2024 at 4:03 PM Szabolcs Nagy <nsz@...t70.net> wrote: > > * Rich Felker <dalias@...c.org> [2024-02-12 21:08:34 -0500]: > > On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote: > > > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@...gle.com> wrote: > > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@...c.org> wrote: > > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > > > > the NOP space > > > > > > so they are backwards compatible, older hardware will just NOP it. > > not quite that simple. sorry long brain dump follows: > > tl;dr: i think the main issues are asm handling (property notes > and cfi for pac-ret), property note handling in ld.so, perf > overhead and possible compat issues of pac-ret (not possible > to disable per process) and testing (ensuring the code works > when pac/bti is not nop). > > - asm code needs manual marking. > > GNU_PROPERTY note in asm is ugly and error prone, see > https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/asmdefs.h#L23 > > i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK > (this was an oversight, llvm added an option, binutils gas has none). what's the option? (since Android only supports llvm, that might be worth considering as a slight cleanup for us...) > - if asm code does indirect tailcall it should use x16 or x17. > > bti c is compatible with the indirect branch that way. > > - only functions that *maybe called indirectly* need bti c. > > this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc > currently disagree about the definition in case of linker inserted > veneers: gcc assumes ld handles the case if an inserted veneer > indirectly branches to a non-bti location, llvm emits bti in all > functions (including local ones without their address taken), just > in case ld inserts an indirect veneer. > https://github.com/ARM-software/abi-aa/issues/196 > > there is some difference to how PLTs are emitted between the > linkers, but i don't think that causes compat issues (might cause > trouble for tools that try to interpret the PLT). > > - dynamic linker has to figure out when to enable it. > > systemd MDWE (memory deny write exec) feature used to seccomp > filter mprotect(PROT_EXEC) so even if the underlying mapping was > already PROT_EXEC and it just added a PROT_BTI on top, mprotect > would fail. this was fixed by adding an MDWE prctl to linux for > systemd to make it stop using that filter, but there may be other > similar seccomp filters and old kernels without MDWE so glibc > re-mmaps the exec segment (which systemd happily accepts). > > note: the bit that tells if a load segment needs to be mapped as > PROT_BTI is in the load segment (usually program headers are in > the executable segment) so first mmap cannot get it right, unless > a quick read of the prog headers are done before mmap but that has > a lot of failure modes (the size can be unbounded) and does not > gain much compared to just mmap twice. except when the exe is mapped > by the kernel, then we cannot mmap since there is no fd and mprotect > may fail). > MDWE prctl: https://lwn.net/Articles/937315/ > > another detail is that static-exe / ld.so / vdso marking is handled > by the kernel, while other dsos are PROT_BTI marked by ld.so. an > interesting case is dynamic linked exe which was originally handled > by ld.so, but after the MDWE fiasco the kernel started loading it > with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI > protected by the kernel). > > ld.so should also take care to gracefully handle BTI protection > failure (or invalid notes) in dlopen. (although invalid note is > more of an x86 thing.) > > - special functions may need to return indirectly instead of ret > > the only really nasty one is swapcontext which is not supported by > musl so that's good (and it is only a problem if bti is used with > a shadow-stack-like feature). > > for returns_twice functions (like setjmp) the compiler emits bti j > at the call site so the second return can use an indirect branch. > return from unwinder to an exception handling landing pad via > indirect branch is supported too. otherwise return via indirect > branch is not supported (no bti j at call sites). > > - a64fx (hpc core) implemented hint nops in a slow way > > so glibc only adds bti if glibc is configured for bti. > https://sourceware.org/pipermail/libc-alpha/2021-May/125784.html > > > > > > I think it's a little more elaborate than that. Those asm instructions > > > > > need to be added (probably as .instr or .word or something, unless > > > > > there's a way to spell this particular nop that existing tooling will > > > > > understand). > > > > > > You just use the hint <immediate> instructions, they are understood by old > > > toolchains. But you can only support a subset of the BTI/PAC instructions > > yes 'hint <imm>' is armv8.0-a, part of the base isa. > > i think the non-hint pac instructions are not relevant to musl. > > > > but it's been enough for most projects that follow the normal ABI conventions > > > like OpenSSL/BoringSSL,etc, but not enough for libffi for example. > > as far as i know openssl is the reason android does not enable pac: > they added the hint instructions incorrectly at first so there are > binaries that fail if the hw enables pac. that was one reason why "android does not enable pac" _by default in the android target triples for app developers_, yes --- though i think we're at the point where we think we should flip that default (not least because the number of users whose devices would actually _benefit_ from the extra instructions is a lot larger now!): https://github.com/android/ndk/issues/1914 > this reveals another issue with pac/bti: since they are nops on most > existing hw, they are not properly tested (e.g. by distro QA) so a > binary can look ok until you move it to a newer machine. (but recent > amazon graviton 4 has pac/bti so we may get more coverage soon.) exactly --- that was one of our key concerns, that app developers would _think_ they've tested with pac/bti but unknowingly used a device without. (or even an x86-64 emulator!) > > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > > > > not. > > > > > > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > > > > > You need it, as the first instruction from a branch (where longjmp returns to) > > > needs to be a BTI instruction. > > > > Is that different from a normal function return? > > > > Note that in the case of sigsetjmp, (sig)longjmp returns to a point > > inside the sigsetjmp asm, so that point needs the annotation I think. > > setjmp/sigsetjmp has to decide how to protect the longjmp return. > > with a shadow-stack-like bw-edge-cfi, longjmp cannot return with > ret (first ret from setjmp consumes the return address from the > shadow stack), it must use indirect branch. (there is now gcs > which is aarch64 shadow stack, linux support is in progress). > > since jmpbuf does not expose the return address representation a > libc specific protection can be applied (mangling or pac) and then > longjmp can use ret and remain protected. (but e.g. setcontext > exposes the pc/lr so those cannot be mangled in memory). > > compilers emit bti j at the call site of returns_twice functions > to allow both ret and indirect branch. musl sigsetjmp can decide > what it does. > > > > > > > It's been done for many projects, glibc and bionic have it. The > > > > > > problem with BTI is that when one item in the link > > > > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > > > > something like a libc that is fundamental in the link chain, > > > > > > it turns it off for everything. > > > > > > > > > > This presumably requires some kind of machinery for how dynamic > > > > > linking will work, and possibly turning it off if a library without it > > > > > is dlopened? > > > > > > > > > > My understanding doing some brief searches though was that you can > > > > > individually mprotect it off in certain regions. So maybe it's > > > > > possible to just enable only for DSOs that support it? > > > > > > > > correct. > > > > OK, that's good to know. So which direction is it? Do DSOs that > > support BTI need it explicitly turned on via mprotect/mmap flags? Or > > is there some process-global flag to turn it on, and then ones that > > don't support it need it turned off? > > per dso marking and explicit PROT_BTI setting. > > > I suspect it's possible to first enable BTI for third-party libraries > > as a feature of the dynamic linker, and add BTI support for libc > > itself as a separate thing. That might be a nice factoring to make > > changes minimal and easy for ppl to read. > > for third-party, it is enough to fix crt*.o and handle markings > in ld.so. > > (but of course for bti to be effective the libc must have it too, > otherwise there are likely enough gadgets to do whatever) > > > The changes in dynlink.c should be as arch-agnostic as possible. If > > there's a corresponding feature on other archs, it should use the same > > basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining > > the mechanisms for evaluating if an ELF file is compatible, how to do > > the mprotect, etc. > > generic code should work. > > but e.g. see above about who handles the marking (kernel vs ld.so), > turns out x86 (and likely aarch64, riscv) shadow stack uses different > rules: always libc handles the marking. so there are caveats. > > > > > > > The initial scope of code changes would be what's reported when > > > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > > > > > > > Is there a way to disable these warnings so that every asm file does > > > > > not need to be cluttered with annotations? > > > > > > > > well, that's the ELF note stuff i was talking about, and if you don't > > > > have it you'll fall foul of the static linker saying "not all this > > > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > > > > doing nothing because the static linker effectively tells it not to. > > > > > > Yep, well said ENH. It's been since Android since we crossed paths :-). > > > > > > It's not that hard to annotate an asm file :-p I forget what project > > > (I think it was gnutls, but they just use openssl's code for the asm) > > > but I just put it in a header file and by virtue of #include'ing it you get the > > > notes added. > > > > Yes, we generally don't do that. There are no "asm headers" in musl; > > all asm files are self-contained and readable standalone. So if > > there's no way to tell the assembler/linker from the command line that > > files are BTI-compatible without generating a huge load of warning > > spam, I guess it's a mess of copy-and-paste... > > currently there is only the ugly asm directives, see above. > > final notes: > > bti is fairly deployable (iirc x86 ibt failed because it is not per > dso, so dlopen does not really work, but aarch64 does not have that > problem), not strong security (the final binary is littered with > bti j/c so plenty opportunity to misdirect an indirect jump/call), > but at least it has minimal impact (minimal compatibility issues and > on a modern core even when bti is enabled it should not be slower). > > for pac every function can independently decide if it uses pac-ret > (aka return address signing), no need for per dso marking. however > it has bigger compat as well as performance impact: > > there are custom unwinders, pac-ret uses a new dwarf cfi (to mark > the code regions where the return address is signed), custom > unwinders may not understand this and that's a runtime crash. > > some code looks at or modifies the return address (various hacks), such > code needs to be updated or not use pac-ret in relevant functions, but > such issues are hard to discover without hw. > > pac is a per-boot system-wide setting, not per-process, so if there is > any issue or bug there is no way to disable it for one broken binary > (nowadays there is a disable prctl, but it is documented to slow the > system down, so not suitable for working around perf issues). > > on simple cores pac can be slow (can add latency to non-leaf functions) > > with 48bit va space, there is only 7bit pac, i.e. limited protection.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.