kernel-hardening - Re: Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and detect out-of-bounds SP

server OS

Message-ID: <8f805a19-19d1-3c97-c85b-510664d22dad@arm.com>
Date: Fri, 14 Jul 2017 16:03:51 +0100
From: Robin Murphy <robin.murphy@....com>
To: Mark Rutland <mark.rutland@....com>,
 Ard Biesheuvel <ard.biesheuvel@...aro.org>
Cc: Kees Cook <keescook@...omium.org>,
 Kernel Hardening <kernel-hardening@...ts.openwall.com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon
 <will.deacon@....com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 James Morse <james.morse@....com>,
 Takahiro Akashi <akashi.takahiro@...aro.org>,
 Dave Martin <dave.martin@....com>,
 "linux-arm-kernel@...ts.infradead.org"
 <linux-arm-kernel@...ts.infradead.org>,
 Laura Abbott <labbott@...oraproject.org>
Subject: Re: Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and
 detect out-of-bounds SP

On 14/07/17 15:39, Robin Murphy wrote:
> On 14/07/17 15:06, Mark Rutland wrote:
>> On Fri, Jul 14, 2017 at 01:27:14PM +0100, Ard Biesheuvel wrote:
>>> On 14 July 2017 at 11:48, Ard Biesheuvel <ard.biesheuvel@...aro.org> wrote:
>>>> On 14 July 2017 at 11:32, Mark Rutland <mark.rutland@....com> wrote:
>>>>> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote:
>>
>>>>>> OK, so here's a crazy idea: what if we
>>>>>> a) carve out a dedicated range in the VMALLOC area for stacks
>>>>>> b) for each stack, allocate a naturally aligned window of 2x the stack
>>>>>> size, and map the stack inside it, leaving the remaining space
>>>>>> unmapped
>>
>>>>> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate
>>>>> on XZR rather than SP, so to do this we need to get the SP value into a
>>>>> GPR.
>>>>>
>>>>> Previously, I assumed this meant we needed to corrupt a GPR (and hence
>>>>> stash that GPR in a sysreg), so I started writing code to free sysregs.
>>>>>
>>>>> However, I now realise I was being thick, since we can stash the GPR
>>>>> in the SP:
>>>>>
>>>>>         sub     sp, sp, x0      // sp = orig_sp - x0
>>>>>         add     x0, sp, x0      // x0 = x0 - (orig_sp - x0) == orig_sp
>>
>> That comment is off, and should say     x0 = x0 + (orig_sp - x0) == orig_sp
>>
>>>>>         sub     x0, x0, #S_FRAME_SIZE
>>>>>         tb(nz)  x0, #THREAD_SHIFT, overflow
>>>>>         add     x0, x0, #S_FRAME_SIZE
>>>>>         sub     x0, sp, x0
>>>
>>> You need a neg x0, x0 here I think
>>
>> Oh, whoops. I'd mis-simplified things.
>>
>> We can avoid that by storing orig_sp + orig_x0 in sp:
>>
>> 	add	sp, sp, x0	// sp = orig_sp + orig_x0
>> 	sub	x0, sp, x0	// x0 = orig_sp
>> 	< check > 
>> 	sub	x0, sp, x0	// x0 = orig_x0
> 
> Haven't you now forcibly cleared the top bit of x0 thanks to overflow?

...or maybe not. I still can't quite see it, but I suppose it must
cancel out somewhere, since Mr. Helpful C Program[1] has apparently
proven me mistaken :(

I guess that means I approve!

Robin.

[1]:
#include <assert.h>
#include <stdint.h>

int main(void) {
        for (int i = 0; i < 256; i++) {
                for (int j = 0; j < 256; j++) {
                        uint8_t x = i;
                        uint8_t y = j;
                        y = y + x;
                        x = y - x;
                        x = y - x;
                        y = y - x;
                        assert(x == i && y == j);
                }
        }
}

>> 	sub	sp, sp, x0	// sp = orig_sp
>>
>> ... which works in a locally-built kernel where I've aligned all the
>> stacks.
>>
>>> ... only, this requires a dedicated stack region, and so we'd need to
>>> check whether sp is inside that window as well.
>>>
>>> The easieast way would be to use a window whose start address is base2
>>> aligned, but that means the beginning of the kernel VA range (where
>>> KASAN currently lives, and cannot be moved afaik), or a window at the
>>> top of the linear region. Neither look very appealing
>>>
>>> So that means arbitrary low and high limits to compare against in this
>>> entry path. That means more GPRs I'm afraid.
>>
>> Could you elaborate on that? I'm not sure that I follow.
>>
>> My understanding was that the comprimise with this approach is that we
>> only catch overflow/underflow within THREAD_SIZE of the stack, and can
>> get false-negatives elsewhere. Otherwise, IIUC this is sufficient
>>
>> Are you after a more stringent check (like those from the two existing
>> proposals that caught all out-of-bounds accesses)?
>>
>> Or am I missing something else?
>>
>> Thanks,
>> Mark.
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@...ts.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>