Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAN+XpFQCO1nr5tQ4oyPPaSfvnQSvwx-=JCtba2xJXrEN+6=LZg@mail.gmail.com>
Date: Mon, 6 Aug 2018 20:05:20 -0700
From: Mark Brand <markbrand@...gle.com>
To: Ard Biesheuvel <ard.biesheuvel@...aro.org>
Cc: Catalin Marinas <catalin.marinas@....com>, Christoffer Dall <christoffer.dall@....com>, 
	Julien Thierry <julien.thierry@....com>, Kees Cook <keescook@...omium.org>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, 
	Laura Abbott <labbott@...oraproject.org>, Mark Rutland <mark.rutland@....com>, 
	Robin Murphy <robin.murphy@....com>, Will Deacon <will.deacon@....com>, 
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation

I think the phrasing of "limit kernel attack surface against ROP attacks"
is confusing and misleading. ROP does not describe a class of bugs,
vulnerabilities or attacks against the kernel - it's just one of many
code-reuse techniques that can be used by an attacker while exploiting a
vulnerability. But that's kind of off-topic!

I think what this thread is talking about is implementing extremely
coarse-grained reverse-edge control-flow-integrity, in that a return can
only return to the address following a legitimate call, but it can return
to any of those.

I suspect there's not much benefit to this, since (as far as I can see) the
assumption is that an attacker has the means to direct flow of execution as
far as taking complete control of the (el1) stack before executing any ROP
payload.

At that point, I think it's highly unlikely an attacker needs to chain
gadgets through return instructions at all - I suspect there are a few
places in the kernel where it is necessary to load the entire register
context from a register that is not the stack pointer, and it would likely
not be more than a minor inconvenience to an attacker to use these (and
chaining through branch register) instructions instead of chaining through
return instructions.

I'd have to take a closer look at an arm64 kernel image to be sure though -
I'll do that when I get a chance and update...

Regards,
Mark

On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@...aro.org>
wrote:

> On 6 August 2018 at 21:50, Kees Cook <keescook@...omium.org> wrote:
> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> > <ard.biesheuvel@...aro.org> wrote:
> >> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org> wrote:
> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@....com>
> wrote:
> >>>> I guess what I'm getting at is that if the protection mechanism is
> "always
> >>>> return with SP outside TTBR1", there seems little point in going
> through the
> >>>> motions if SP in TTBR0 could still be valid and allow an attack to
> succeed
> >>>> anyway; this is basically just me working through a justification for
> saying
> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> ARM64_SW_TTBR0_PAN",
> >>>> making it that much uglier for v8.0 CPUs...
> >>>
> >>> I think anyone with v8.0 CPUs interested in this mitigation would also
> >>> very much want PAN emulation. If a "depends on" isn't desired, what
> >>> about "imply" in the Kconfig?
> >>>
> >>
> >> Yes, but actually, using bit #0 is maybe a better alternative in any
> >> case. You can never dereference SP with bit #0 set, regardless of
> >> whether the address points to user or kernel space, and my concern
> >> about reloading sp from x29 doesn't really make sense, given that x29
> >> is always assigned from sp right after pushing x29 and x30 in the
> >> function prologue, and sp only gets restored from x29 in the epilogue
> >> when there is a stack frame to begin with, in which case we add #1 to
> >> sp again before returning from the function.
> >
> > Fair enough! :)
> >
> >> The other code gets a lot cleaner as well.
> >>
> >> So for the return we'll have
> >>
> >>   ldp     x29, x30, [sp], #nn
> >>>>add     sp, sp, #0x1
> >>   ret
> >>
> >> and for the function call
> >>
> >>   bl      <foo>
> >>>>mov      x30, sp
> >>>>bic     sp, x30, #1
> >>
> >> The restore sequence in entry.s:96 (which has no spare registers) gets
> >> much simpler as well:
> >>
> >> --- a/arch/arm64/kernel/entry.S
> >> +++ b/arch/arm64/kernel/entry.S
> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> >>          */
> >>         add     sp, sp, x0      // sp' = sp + x0
> >>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> >> +       tbnz    x0, #0, 1f
> >> +       .subsection     1
> >> +1:     sub     x0, x0, #1
> >> +       sub     sp, sp, #1
> >> +       b       2f
> >> +       .previous
> >> +2:
> >> +#endif
> >>         tbnz    x0, #THREAD_SHIFT, 0f
> >>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp =
> x0
> >>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> >
> > I get slightly concerned about "add" vs "clear bit", but I don't see a
> > real way to chain a lot of "add"s to get to avoid the unaligned
> > access. Is "or" less efficient than "add"?
> >
>
> Yes. The stack pointer is special on arm64, and can only be used with
> a limited set of ALU instructions. So orring #1 would involve 'mov
> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> requires a scratch register as well.
>

Content of type "text/html" skipped

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4849 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.