|
Message-ID: <CAKv+Gu_jNh7vLfGEdi2BpA42rPJrNr8qV-Xss_toV2fJzUV5jQ@mail.gmail.com> Date: Tue, 7 Aug 2018 11:21:58 +0200 From: Ard Biesheuvel <ard.biesheuvel@...aro.org> To: Mark Brand <markbrand@...gle.com> Cc: Catalin Marinas <catalin.marinas@....com>, Christoffer Dall <christoffer.dall@....com>, Julien Thierry <julien.thierry@....com>, Kees Cook <keescook@...omium.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Laura Abbott <labbott@...oraproject.org>, Mark Rutland <mark.rutland@....com>, Robin Murphy <robin.murphy@....com>, Will Deacon <will.deacon@....com>, linux-arm-kernel <linux-arm-kernel@...ts.infradead.org> Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation On 7 August 2018 at 05:05, Mark Brand <markbrand@...gle.com> wrote: > I think the phrasing of "limit kernel attack surface against ROP attacks" is > confusing and misleading. ROP does not describe a class of bugs, > vulnerabilities or attacks against the kernel - it's just one of many > code-reuse techniques that can be used by an attacker while exploiting a > vulnerability. But that's kind of off-topic! > > I think what this thread is talking about is implementing extremely > coarse-grained reverse-edge control-flow-integrity, in that a return can > only return to the address following a legitimate call, but it can return to > any of those. > Indeed. Apologies for not mastering the lingo, but it is indeed about no longer being able to subvert function returns into jumping to arbitrary places in the code. > I suspect there's not much benefit to this, since (as far as I can see) the > assumption is that an attacker has the means to direct flow of execution as > far as taking complete control of the (el1) stack before executing any ROP > payload. > > At that point, I think it's highly unlikely an attacker needs to chain > gadgets through return instructions at all - I suspect there are a few > places in the kernel where it is necessary to load the entire register > context from a register that is not the stack pointer, and it would likely > not be more than a minor inconvenience to an attacker to use these (and > chaining through branch register) instructions instead of chaining through > return instructions. > > I'd have to take a closer look at an arm64 kernel image to be sure though - > I'll do that when I get a chance and update... > Thanks. Reloading all registers from an arbitrary offset register should occur rarely, no? Could we work around that? > On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@...aro.org> > wrote: >> >> On 6 August 2018 at 21:50, Kees Cook <keescook@...omium.org> wrote: >> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel >> > <ard.biesheuvel@...aro.org> wrote: >> >> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org> wrote: >> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@....com> >> >>> wrote: >> >>>> I guess what I'm getting at is that if the protection mechanism is >> >>>> "always >> >>>> return with SP outside TTBR1", there seems little point in going >> >>>> through the >> >>>> motions if SP in TTBR0 could still be valid and allow an attack to >> >>>> succeed >> >>>> anyway; this is basically just me working through a justification for >> >>>> saying >> >>>> the proposed scheme needs "depends on ARM64_PAN || >> >>>> ARM64_SW_TTBR0_PAN", >> >>>> making it that much uglier for v8.0 CPUs... >> >>> >> >>> I think anyone with v8.0 CPUs interested in this mitigation would also >> >>> very much want PAN emulation. If a "depends on" isn't desired, what >> >>> about "imply" in the Kconfig? >> >>> >> >> >> >> Yes, but actually, using bit #0 is maybe a better alternative in any >> >> case. You can never dereference SP with bit #0 set, regardless of >> >> whether the address points to user or kernel space, and my concern >> >> about reloading sp from x29 doesn't really make sense, given that x29 >> >> is always assigned from sp right after pushing x29 and x30 in the >> >> function prologue, and sp only gets restored from x29 in the epilogue >> >> when there is a stack frame to begin with, in which case we add #1 to >> >> sp again before returning from the function. >> > >> > Fair enough! :) >> > >> >> The other code gets a lot cleaner as well. >> >> >> >> So for the return we'll have >> >> >> >> ldp x29, x30, [sp], #nn >> >>>>add sp, sp, #0x1 >> >> ret >> >> >> >> and for the function call >> >> >> >> bl <foo> >> >>>>mov x30, sp >> >>>>bic sp, x30, #1 >> >> >> >> The restore sequence in entry.s:96 (which has no spare registers) gets >> >> much simpler as well: >> >> >> >> --- a/arch/arm64/kernel/entry.S >> >> +++ b/arch/arm64/kernel/entry.S >> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif >> >> */ >> >> add sp, sp, x0 // sp' = sp + x0 >> >> sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp >> >> +#ifdef CONFIG_ARM64_ROP_SHIELD >> >> + tbnz x0, #0, 1f >> >> + .subsection 1 >> >> +1: sub x0, x0, #1 >> >> + sub sp, sp, #1 >> >> + b 2f >> >> + .previous >> >> +2: >> >> +#endif >> >> tbnz x0, #THREAD_SHIFT, 0f >> >> sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = >> >> x0 >> >> sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = >> >> sp >> > >> > I get slightly concerned about "add" vs "clear bit", but I don't see a >> > real way to chain a lot of "add"s to get to avoid the unaligned >> > access. Is "or" less efficient than "add"? >> > >> >> Yes. The stack pointer is special on arm64, and can only be used with >> a limited set of ALU instructions. So orring #1 would involve 'mov >> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which >> requires a scratch register as well.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.