|
|
Message-ID: <2f24df9e-64b1-ffef-ddbb-08f7fe8d2a96@redhat.com>
Date: Fri, 14 Jul 2017 13:51:24 -0700
From: Laura Abbott <labbott@...hat.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Kees Cook <keescook@...omium.org>, Alex Popov <alex.popov@...ux.com>,
kernel-hardening@...ts.openwall.com,
Ard Biesheuvel <ard.biesheuvel@...aro.org>
Subject: Re: [RFC][PATCH 2/2] arm64: Clear the stack
On 07/11/2017 12:51 PM, Mark Rutland wrote:
> Hi Laura,
>
> On Mon, Jul 10, 2017 at 03:04:43PM -0700, Laura Abbott wrote:
>> Implementation of stackleak based heavily on the x86 version
>
> Nice!
>
>> Signed-off-by: Laura Abbott <labbott@...hat.com>
>> ---
>> The biggest points that need review here:
>> - Is the extra offsetting (subtracting/adding from the stack) correct?
>
> I think it's a little off; more on that below.
>
>> - Where else do we need to clear the stack?
>
> I guess we might need to clear (all of the remainder of) the stack after
> invoking EFI runtime services -- those can run in task context, might
> leave sensitive values on the stack, and they're uninstrumented. The
> same would apply for x86.
>
> I think we can ignore garbage left on the stack by idle/hotplug, since
> that happens in the idle thread, so we shouldn't be doing uaccess
> transfers on those stacks.
>
>> - The assembly can almost certainly be optimized. I tried to keep the
>> behavior of the x86 'repe scasq' and the like where possible. I'm also a
>> terrible register allocator.
>
> I tried to steer clear of code golf here, but I didn't entirely manage.
>
> I also don't know x86 asm, so it's very possible I've just managed to
> confuse myself.
>
I know enough x86 asm to confuse myself as you can see below
>> ---
>> arch/arm64/Kconfig | 1 +
>> arch/arm64/include/asm/processor.h | 3 ++
>> arch/arm64/kernel/asm-offsets.c | 3 ++
>> arch/arm64/kernel/entry.S | 92 +++++++++++++++++++++++++++++++++++
>> arch/arm64/kernel/process.c | 18 +++++++
>> drivers/firmware/efi/libstub/Makefile | 3 +-
>> scripts/Makefile.gcc-plugins | 5 +-
>> 7 files changed, 123 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 8addb85..0b65bfc 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -17,6 +17,7 @@ config ARM64
>> select ARCH_HAS_KCOV
>> select ARCH_HAS_SET_MEMORY
>> select ARCH_HAS_SG_CHAIN
>> + select ARCH_HAS_STACKLEAK
>> select ARCH_HAS_STRICT_KERNEL_RWX
>> select ARCH_HAS_STRICT_MODULE_RWX
>> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
>> index 64c9e78..76f2738 100644
>> --- a/arch/arm64/include/asm/processor.h
>> +++ b/arch/arm64/include/asm/processor.h
>> @@ -88,6 +88,9 @@ struct thread_struct {
>> unsigned long fault_address; /* fault info */
>> unsigned long fault_code; /* ESR_EL1 value */
>> struct debug_info debug; /* debugging */
>> +#ifdef CONFIG_STACKLEAK
>> + unsigned long lowest_stack;
>> +#endif
>> };
>>
>> #ifdef CONFIG_COMPAT
>> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
>> index b3bb7ef..e0a5ae2 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -43,6 +43,9 @@ int main(void)
>> DEFINE(TSK_TI_TTBR0, offsetof(struct task_struct, thread_info.ttbr0));
>> #endif
>> DEFINE(TSK_STACK, offsetof(struct task_struct, stack));
>> +#ifdef CONFIG_STACKLEAK
>> + DEFINE(TSK_TI_LOWEST_STACK, offsetof(struct task_struct, thread.lowest_stack));
>> +#endif
>> BLANK();
>> DEFINE(THREAD_CPU_CONTEXT, offsetof(struct task_struct, thread.cpu_context));
>> BLANK();
>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>> index b738880..e573633 100644
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -744,6 +744,7 @@ ENDPROC(cpu_switch_to)
>> */
>> ret_fast_syscall:
>> disable_irq // disable interrupts
>> + bl erase_kstack
>> str x0, [sp, #S_X0] // returned x0
>> ldr x1, [tsk, #TSK_TI_FLAGS] // re-check for syscall tracing
>> and x2, x1, #_TIF_SYSCALL_WORK
>> @@ -772,6 +773,7 @@ work_pending:
>> */
>> ret_to_user:
>> disable_irq // disable interrupts
>> + bl erase_kstack
>> ldr x1, [tsk, #TSK_TI_FLAGS]
>> and x2, x1, #_TIF_WORK_MASK
>> cbnz x2, work_pending
>> @@ -865,3 +867,93 @@ ENTRY(sys_rt_sigreturn_wrapper)
>> mov x0, sp
>> b sys_rt_sigreturn
>> ENDPROC(sys_rt_sigreturn_wrapper)
>> +
>> +#ifdef CONFIG_STACKLEAK
>> +ENTRY(erase_kstack)
>> + /* save x0 for the fast path */
>> + mov x10, x0
>> +
>> + get_thread_info x0
>> + ldr x1, [x0, #TSK_TI_LOWEST_STACK]
>> +
>> + /* get the number of bytes to check for lowest stack */
>> + mov x3, x1
>> + and x3, x3, #THREAD_SIZE - 1
>> + lsr x3, x3, #3
>> +
>> + /* now generate the start of the stack */
>> + mov x4, sp
>> + movn x2, #THREAD_SIZE - 1
>> + and x1, x4, x2
>> +
>> + mov x2, #-0xBEEF /* stack poison */
>> +
>> + cmp x3, #0
>> + b.eq 4f /* Found nothing, go poison */
>> +
>> +1:
>> + ldr x4, [x1, x3, lsl #3]
>> + cmp x4, x2 /* did we find the poison? */
>> + b.eq 2f /* yes we did, go check it */
>> +
>> + sub x3, x3, #1
>> + cmp x3, #0
>> + b.eq 4f /* Found nothing, go poison */
>> + b 1b /* loop again */
>> +
>> +2:
>
> IIUC, at this point, x1 was the start (lowest addr) of the stack, and x3
> was the quadword index of the first poison on that stack.
>
> The x86 asm implicitly held that [x1, x3, lsl #3] address in RDI, but we
> don't have a copy...
>
>> + cmp x3, #16
>> + b.ls 4f /* Go poison if there are less than 16 things left? */
>> +
>> + mov x3, #16
>
> ... and here we blat the index without saving it, meaning we always jump
> to close to the start of the stack, which I don't think was intentional.
>
Urgh. You are right. I had an index register when I first
started out and then a different bug caused me to erroneously
remove it.
> So then we fall though to the below...
>
>> +3:
>> + ldr x4, [x1, x3, lsl #3]
>> + cmp x4, x2 /* did we find the poison? */
>> + b.ne 1b /* nope we have to check deeper */
>> +
>> + sub x3, x3, #1
>> + cmp x3, #0
>> + b.eq 4f /* Found nothing, go poison */
>> + b 3b /* loop again */
>
> ... where we either:
>
> * find 16 contiguous poison entries, leaving x3 == 0, or:
>
> * we immediately find a non-poison value, and jump back to 1b. If there
> are 16 non-poison values, we're left with x3 == 0, otherwise we bail
> out and jump to 4f with x3 in the interval [0,15].
>
> .... or I've completely confused myself, as suggested above.
>
> We might not have x86's fancy string instructions, but we could do
> something equally impenetrable:
>
> mov x5, #0
> 1:
> cbz x3, 4f
> ldr x4, [x1, x3, lsl #3]
> cmp x4, x2
> csinc x5, xzr, x5, ne
> tbz x5, #4, 4f // found 16 poisons?
> sub x3, x3, #1
> b 1b
>
> That doesn't stop 16 slots early, so it can return any value of x3 all
> the way down to zero.
>
Seems to work for my testing.
>> +
>> + /* The poison function */
>> +4:
>> + /* Generate the address we found */
>> + add x5, x1, x3, lsl #3
>> + orr x5, x5, #16
>
> Have you figured out what the forced 16 byte offset is for?
>
> I've not managed to figure out why it's there, and it looks like
> Alexander couldn't either, judging by his comments in the x86 asm.
>
>From get_wchan in arch/x86/kernel/process.c, it might be be to
account for the start of the frame correctly? I don't have a
definitive answer though and plan on just removing this for the
next version.
>> >> + mov x4, sp
>> + /* subtrace the current pointer */
>> + sub x8, x4, x5
>> +
>> + /* subtract one more so we don't touch the current sp */
>> + sub x8, x8, #1
>> +
>> + /* sanity check */
>> + cmp x8, #THREAD_SIZE
>> + b.lo 5f
>> +999:
>> + b 999b
>> +
>> +5:
>> + lsr x8, x8, #3
>> + mov x3, x8
>> +6:
>> + cmp x3, #0
>> + b.eq 7f
>> +
>> + str x2, [x1, x3, lsl #3]
>> + sub x3, x3, #1
>> + b 6b
>> +
>> + /* Reset the lowest stack to the top of the stack */
>> +7:
>> + ldr x1, [x0, TSK_STACK]
>> + add x1, x1, #THREAD_SIZE
>> + sub x1, x1, #256
>> + str x1, [x0, #TSK_TI_LOWEST_STACK]
>
> I take it this is the offsetting you were querying?
>
> I don't think it's quite right. Our stack looks like:
>
> +---+ <- task_stack_page(p) + THREAD_SIZE
> | |
> +---+ <- task_stack_page(p) + THREAD_START_SP
> | |
> | |
> +---+ <- task_pt_regs(p)
> | |
> | |
> | |
> ~~~~~
>
> ~~~~~
> | |
> | |
> | |
> +---+ <- task_stack_page(p)
>
> At the point we return to userspace, sp == task_pt_regs(p).
>
> Judging by a generated asm-offsets.h, sizeof(struct_pt_regs) is 304
> bytes currently. THREAD_SIZE - THREAD_START_SP == 16.
>
> We probably want to give that 16 a mnemonic (e.g FRAME_PADDING), and
> have something like:
>
> ldr x1, [x0, TSK_STACK]
> add x1, x1, #THREAD_SIZE
> sub x1, x1, #(S_FRAME_SIZE + FRAME_PADDING)
> str x1, [x0, #TSK_TI_LOWEST_STACK]
>
Yes, that looks cleaner. I suspect that even though we weren't
subtracting enough, it still worked because the lowest stack
would get overwritten in track_stack later.
>> +
>> + mov x0, x10
>> + ret
>> +ENDPROC(erase_kstack)
>> +#endif
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index 659ae80..1b6cca2 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -293,6 +293,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>> p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
>> p->thread.cpu_context.sp = (unsigned long)childregs;
>>
>> +#ifdef CONFIG_STACKLEAK
>> + p->thread.lowest_stack = (unsigned long)task_stack_page(p) +
>> + 2 * sizeof(unsigned long);
>> +#endif
>
> I see this follows the x86 logic, but I don't see why it's necessary to
> offset this end of the stack.
>
> Do you have an idea as to what this is for?
>
Again, no and I think I'll again remove it.
>> +
>> +
>> ptrace_hw_copy_thread(p);
>>
>> return 0;
>> @@ -417,3 +423,15 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
>> else
>> return randomize_page(mm->brk, SZ_1G);
>> }
>> +
>> +#ifdef CONFIG_STACKLEAK
>> +void __used check_alloca(unsigned long size)
>> +{
>> + unsigned long sp = (unsigned long)&sp, stack_left;
>
> Nit: please use current_stack_pointer.
>
Ack. This should also be cleaned up in the the track_stack
function as well.
> Thanks,
> Mark.
>
Thanks,
Laura
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.