|
Message-ID: <CAN+XpFQCO1nr5tQ4oyPPaSfvnQSvwx-=JCtba2xJXrEN+6=LZg@mail.gmail.com> Date: Mon, 6 Aug 2018 20:05:20 -0700 From: Mark Brand <markbrand@...gle.com> To: Ard Biesheuvel <ard.biesheuvel@...aro.org> Cc: Catalin Marinas <catalin.marinas@....com>, Christoffer Dall <christoffer.dall@....com>, Julien Thierry <julien.thierry@....com>, Kees Cook <keescook@...omium.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Laura Abbott <labbott@...oraproject.org>, Mark Rutland <mark.rutland@....com>, Robin Murphy <robin.murphy@....com>, Will Deacon <will.deacon@....com>, linux-arm-kernel <linux-arm-kernel@...ts.infradead.org> Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation I think the phrasing of "limit kernel attack surface against ROP attacks" is confusing and misleading. ROP does not describe a class of bugs, vulnerabilities or attacks against the kernel - it's just one of many code-reuse techniques that can be used by an attacker while exploiting a vulnerability. But that's kind of off-topic! I think what this thread is talking about is implementing extremely coarse-grained reverse-edge control-flow-integrity, in that a return can only return to the address following a legitimate call, but it can return to any of those. I suspect there's not much benefit to this, since (as far as I can see) the assumption is that an attacker has the means to direct flow of execution as far as taking complete control of the (el1) stack before executing any ROP payload. At that point, I think it's highly unlikely an attacker needs to chain gadgets through return instructions at all - I suspect there are a few places in the kernel where it is necessary to load the entire register context from a register that is not the stack pointer, and it would likely not be more than a minor inconvenience to an attacker to use these (and chaining through branch register) instructions instead of chaining through return instructions. I'd have to take a closer look at an arm64 kernel image to be sure though - I'll do that when I get a chance and update... Regards, Mark On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@...aro.org> wrote: > On 6 August 2018 at 21:50, Kees Cook <keescook@...omium.org> wrote: > > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel > > <ard.biesheuvel@...aro.org> wrote: > >> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org> wrote: > >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@....com> > wrote: > >>>> I guess what I'm getting at is that if the protection mechanism is > "always > >>>> return with SP outside TTBR1", there seems little point in going > through the > >>>> motions if SP in TTBR0 could still be valid and allow an attack to > succeed > >>>> anyway; this is basically just me working through a justification for > saying > >>>> the proposed scheme needs "depends on ARM64_PAN || > ARM64_SW_TTBR0_PAN", > >>>> making it that much uglier for v8.0 CPUs... > >>> > >>> I think anyone with v8.0 CPUs interested in this mitigation would also > >>> very much want PAN emulation. If a "depends on" isn't desired, what > >>> about "imply" in the Kconfig? > >>> > >> > >> Yes, but actually, using bit #0 is maybe a better alternative in any > >> case. You can never dereference SP with bit #0 set, regardless of > >> whether the address points to user or kernel space, and my concern > >> about reloading sp from x29 doesn't really make sense, given that x29 > >> is always assigned from sp right after pushing x29 and x30 in the > >> function prologue, and sp only gets restored from x29 in the epilogue > >> when there is a stack frame to begin with, in which case we add #1 to > >> sp again before returning from the function. > > > > Fair enough! :) > > > >> The other code gets a lot cleaner as well. > >> > >> So for the return we'll have > >> > >> ldp x29, x30, [sp], #nn > >>>>add sp, sp, #0x1 > >> ret > >> > >> and for the function call > >> > >> bl <foo> > >>>>mov x30, sp > >>>>bic sp, x30, #1 > >> > >> The restore sequence in entry.s:96 (which has no spare registers) gets > >> much simpler as well: > >> > >> --- a/arch/arm64/kernel/entry.S > >> +++ b/arch/arm64/kernel/entry.S > >> @@ -95,6 +95,15 @@ alternative_else_nop_endif > >> */ > >> add sp, sp, x0 // sp' = sp + x0 > >> sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp > >> +#ifdef CONFIG_ARM64_ROP_SHIELD > >> + tbnz x0, #0, 1f > >> + .subsection 1 > >> +1: sub x0, x0, #1 > >> + sub sp, sp, #1 > >> + b 2f > >> + .previous > >> +2: > >> +#endif > >> tbnz x0, #THREAD_SHIFT, 0f > >> sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = > x0 > >> sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp > > > > I get slightly concerned about "add" vs "clear bit", but I don't see a > > real way to chain a lot of "add"s to get to avoid the unaligned > > access. Is "or" less efficient than "add"? > > > > Yes. The stack pointer is special on arm64, and can only be used with > a limited set of ALU instructions. So orring #1 would involve 'mov > <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which > requires a scratch register as well. > Content of type "text/html" skipped Download attachment "smime.p7s" of type "application/pkcs7-signature" (4849 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.