|
|
Message-ID: <CAKv+Gu9uiqerCd54upWHxPBKcX1ga1oTwpYctEqbhhQGZOq6XA@mail.gmail.com>
Date: Wed, 27 Jun 2018 00:11:13 +0200
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Laura Abbott <labbott@...hat.com>
Cc: linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
Mark Rutland <mark.rutland@....com>, Kees Cook <keescook@...omium.org>,
Will Deacon <will.deacon@....com>, Catalin Marinas <catalin.marinas@....com>,
James Morse <james.morse@....com>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>, YaoJun <yaojun8558363@...il.com>
Subject: Re: [RFC PATCH] arm64/mm: unmap the linear alias of module allocations
Hi Laura,
Thanks for taking a look.
On 26 June 2018 at 23:28, Laura Abbott <labbott@...hat.com> wrote:
> On 06/26/2018 09:54 AM, Ard Biesheuvel wrote:
>>
>> When CONFIG_STRICT_KERNEL_RXW=y [which is the default on arm64], we
>> take great care to ensure that the mappings of modules in the vmalloc
>> space are locked down as much as possible, i.e., executable code is
>> mapped read-only, read-only data is mapped read-only and non-executable,
>> and read-write data is mapped non-executable as well.
>>
>> However, due to the way we map the linear region [aka the kernel direct
>> mapping], those module regions are aliased by read-write mappings, and
>> it is possible for an attacker to modify code or data that was assumed
>> to be immutable in this configuration.
>>
>> So let's ensure that the linear alias of module memory is unmapped upon
>> allocation and remapped when it is freed. The latter requires some
>> special handling involving a workqueue due to the fact that it may be
>> called in softirq context at which time calling find_vm_area() is unsafe.
>>
>> Note that this requires the entire linear region to be mapped down to
>> pages, which may result in a performance regression in some
>> configurations.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@...aro.org>
>> ---
>> For this RFC, I simply reused set_memory_valid() to do the unmap/remap,
>> but I am aware that this likely breaks hibernation, and perhaps some
>> other things as well, so we should probably remap r/o instead.
>>
>
> This fixes modules but doesn't fix the set_memory_*
> uses like in bpf. Is it worth trying to fix it for those
> cases as well?
>
AIUI bpf uses set_memory_* on the vmalloc region only, and is
oblivious to the fact that there exists a linear alias in the first
place. But due to the fact that it actually uses module_alloc() to
allocate the regions, this patch affects the linear aliases of those
regions as well.
Does that answer your question at all?
>
>> arch/arm64/kernel/module.c | 57 ++++++++++++++++++++
>> arch/arm64/mm/mmu.c | 2 +-
>> 2 files changed, 58 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
>> index 155fd91e78f4..4a1d3c7486f5 100644
>> --- a/arch/arm64/kernel/module.c
>> +++ b/arch/arm64/kernel/module.c
>> @@ -26,10 +26,66 @@
>> #include <linux/mm.h>
>> #include <linux/moduleloader.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/workqueue.h>
>> #include <asm/alternative.h>
>> +#include <asm/cacheflush.h>
>> #include <asm/insn.h>
>> #include <asm/sections.h>
>> +#ifdef CONFIG_STRICT_KERNEL_RWX
>> +
>> +static struct workqueue_struct *module_free_wq;
>> +
>> +static int init_workqueue(void)
>> +{
>> + module_free_wq = alloc_ordered_workqueue("module_free_wq", 0);
>> + WARN_ON(!module_free_wq);
>> +
>> + return 0;
>> +}
>> +pure_initcall(init_workqueue);
>> +
>> +static void remap_linear_module_alias(void *module_region, int enable)
>> +{
>> + struct vm_struct *vm = find_vm_area(module_region);
>> + struct page **p;
>> + unsigned long size;
>> +
>> + WARN_ON(!vm || !vm->pages);
>> +
>> + for (p = vm->pages, size = vm->size; size > 0; size -= PAGE_SIZE)
>> + set_memory_valid((u64)page_address(*p++), 1, enable);
>> +}
>> +
>> +static void module_free_wq_worker(struct work_struct *work)
>> +{
>> + remap_linear_module_alias(work, true);
>> + vfree(work);
>> +}
>> +
>> +void module_memfree(void *module_region)
>> +{
>> + struct work_struct *work;
>> +
>> + if (!module_region)
>> + return;
>> +
>> + /*
>> + * At this point, module_region is a pointer to an allocation of
>> at
>> + * least PAGE_SIZE bytes that is mapped read-write. So instead of
>> + * allocating memory for a data structure containing a work_struct
>> + * instance and a copy of the value of module_region, just reuse
>> the
>> + * allocation directly.
>> + */
>> + work = module_region;
>> + INIT_WORK(work, module_free_wq_worker);
>> + queue_work(module_free_wq, work);
>> +}
>> +
>> +#else
>> +static void remap_linear_module_alias(void *module_region, int enable) {}
>> +#endif
>> +
>> void *module_alloc(unsigned long size)
>> {
>> gfp_t gfp_mask = GFP_KERNEL;
>> @@ -65,6 +121,7 @@ void *module_alloc(unsigned long size)
>> return NULL;
>> }
>> + remap_linear_module_alias(p, false);
>> return p;
>> }
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 493ff75670ff..e1057ebb672d 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -432,7 +432,7 @@ static void __init map_mem(pgd_t *pgdp)
>> struct memblock_region *reg;
>> int flags = 0;
>> - if (debug_pagealloc_enabled())
>> + if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) ||
>> debug_pagealloc_enabled())
>> flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>> /*
>>
>
> I think this should be based on rodata_enabled instead of the kernel
> configuration option.
>
Agreed.
> This looks reasonable from the pagetable debugfs but I'm seeing
> some intermittent weird output:
>
> ---[ Linear Mapping ]---
> 0xffff800000000000-0xffff800000001000 4K PTE RW x
> BLK DEVICE/nGnRnE
> 0xffff800000200000-0xffff800000280000 512K PTE RW NX SHD AF NG
> UXN MEM/NORMAL
> 0xffff800000280000-0xffff800000400000 1536K PTE ro NX SHD AF NG
> UXN MEM/NORMAL
> 0xffff800000400000-0xffff800001400000 16M PMD ro NX SHD AF NG
> BLK UXN MEM/NORMAL
> 0xffff800001400000-0xffff800001440000 256K PTE ro NX SHD AF NG
> UXN MEM/NORMAL
> 0xffff800001440000-0xffff800010000000 241408K PTE RW NX SHD AF NG
> UXN MEM/NORMAL
>
>
> I can't tell if this is some artifact of how we do the
> debugfs dumping or an actual bug. It doesn't seem to
> happen on every boot either. I'll play around with this
> to see if I find anything.
>
This is indeed weird, but I think the ptdump code makes it look
weirder than it actually is: the BLK attribute means bit 1 is cleared,
and the nGnRnE device attribute means bits [4:2] are cleared, so this
is essentially a reserved entry, although I wouldn't be able to
explain how we ended up with one.
Are the first 2 MB of DRAM made available to the kernel by the firmware?
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.