|
Message-Id: <1460757793-59020-4-git-send-email-thgarnie@google.com> Date: Fri, 15 Apr 2016 15:03:12 -0700 From: Thomas Garnier <thgarnie@...gle.com> To: "H . Peter Anvin" <hpa@...or.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...e.de>, Andy Lutomirski <luto@...nel.org>, Thomas Garnier <thgarnie@...gle.com>, Dmitry Vyukov <dvyukov@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, Dan Williams <dan.j.williams@...el.com>, Kees Cook <keescook@...omium.org>, Stephen Smalley <sds@...ho.nsa.gov>, Seth Jennings <sjennings@...iantweb.net>, Kefeng Wang <wangkefeng.wang@...wei.com>, Jonathan Corbet <corbet@....net>, Matt Fleming <matt@...eblueprint.co.uk>, Toshi Kani <toshi.kani@....com>, Alexander Kuleshov <kuleshovmail@...il.com>, Alexander Popov <alpopov@...ecurity.com>, Joerg Roedel <jroedel@...e.de>, Dave Young <dyoung@...hat.com>, Baoquan He <bhe@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>, Mark Salter <msalter@...hat.com>, Boris Ostrovsky <boris.ostrovsky@...cle.com> Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org, gthelen@...gle.com, kernel-hardening@...ts.openwall.com Subject: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64) Randomizes the virtual address space of kernel memory sections (physical memory mapping, vmalloc & vmemmap) for x86_64. This security feature mitigates exploits relying on predictable kernel addresses. These addresses can be used to disclose the kernel modules base addresses or corrupt specific structures to elevate privileges bypassing the current implementation of KASLR. This feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option. The physical memory mapping holds most allocations from boot and heap allocators. Knowning the base address and physical memory size, an attacker can deduce the PDE virtual address for the vDSO memory page. This attack was demonstrated at CanSecWest 2016, in the "Getting Physical Extreme Abuse of Intel Based Paged Systems" https://goo.gl/ANpWdV (see second part of the presentation). Similar research was done at Google leading to this patch proposal. Variants exists to overwrite /proc or /sys objects ACLs leading to elevation of privileges. The vmalloc memory section contains the allocation made through the vmalloc api. The allocations are done sequentially to prevent fragmentation and each allocation address can easily be deduced especially from boot. The vmemmap section holds a representation of the physical memory (through a struct page array). An attacker could use this section to disclose the kernel memory layout (walking the page linked list). The order of each memory section is not changed. The feature looks at the available space for the sections based on different configuration options and randomizes the base and space between each. The size of the physical memory mapping is the available physical memory. No performance impact was detected while testing the feature. Entropy is generated using the KASLR early boot functions now shared in the lib directory (originally written by Kees Cook). Randomization is done on PGD & PUD page table levels to increase possible addresses. The physical memory mapping code was adapted to support PUD level virtual addresses. An additional low memory page is used to ensure each CPU can start with a PGD aligned virtual address (for realmode). x86/dump_pagetable was updated to correctly display each section. Updated documentation on x86_64 memory layout accordingly. Signed-off-by: Thomas Garnier <thgarnie@...gle.com> --- Based on next-20160413 --- Documentation/x86/x86_64/mm.txt | 4 + arch/x86/Kconfig | 15 ++++ arch/x86/include/asm/kaslr.h | 12 +++ arch/x86/include/asm/page_64_types.h | 12 ++- arch/x86/include/asm/pgtable_64.h | 1 + arch/x86/include/asm/pgtable_64_types.h | 15 +++- arch/x86/kernel/head_64.S | 2 +- arch/x86/kernel/setup.c | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/dump_pagetables.c | 11 ++- arch/x86/mm/init_64.c | 3 + arch/x86/mm/kaslr.c | 151 ++++++++++++++++++++++++++++++++ arch/x86/realmode/init.c | 5 ++ 13 files changed, 226 insertions(+), 8 deletions(-) create mode 100644 arch/x86/mm/kaslr.c diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index c518dce..1918777 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed). The mappings are not part of any other kernel PGD and are only available during EFI runtime calls. +Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all +physical memory, vmalloc/ioremap space and virtual memory map are randomized. +Their order is preserved but their base will be changed early at boot time. + -Andi Kleen, Jul 2004 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2632f60..7c786d4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2003,6 +2003,21 @@ config PHYSICAL_ALIGN Don't change this unless you know what you are doing. +config RANDOMIZE_MEMORY + bool "Randomize the kernel memory sections" + depends on X86_64 + depends on RANDOMIZE_BASE + default n + ---help--- + Randomizes the virtual address of memory sections (physical memory + mapping, vmalloc & vmemmap). This security feature mitigates exploits + relying on predictable memory locations. + + Base and padding between memory section is randomized. Their order is + not. Entropy is generated in the same way as RANDOMIZE_BASE. + + If unsure, say N. + config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h index 2ae1429..46b42aa 100644 --- a/arch/x86/include/asm/kaslr.h +++ b/arch/x86/include/asm/kaslr.h @@ -3,4 +3,16 @@ unsigned long kaslr_get_random_boot_long(void); +#ifdef CONFIG_RANDOMIZE_MEMORY +extern unsigned long page_offset_base; +extern unsigned long vmalloc_base; +extern unsigned long vmemmap_base; + +void kernel_randomize_memory(void); +void kaslr_trampoline_init(unsigned long page_size_mask); +#else +static inline void kernel_randomize_memory(void) { } +static inline void kaslr_trampoline_init(unsigned long page_size_mask) { } +#endif /* CONFIG_RANDOMIZE_MEMORY */ + #endif diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 4928cf0..79b9c4b 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -1,6 +1,10 @@ #ifndef _ASM_X86_PAGE_64_DEFS_H #define _ASM_X86_PAGE_64_DEFS_H +#ifndef __ASSEMBLY__ +#include <asm/kaslr.h> +#endif + #ifdef CONFIG_KASAN #define KASAN_STACK_ORDER 1 #else @@ -32,7 +36,13 @@ * hypervisor to fit. Choosing 16 slots here is arbitrary, but it's * what Xen requires. */ -#define __PAGE_OFFSET _AC(0xffff880000000000, UL) +#define __PAGE_OFFSET_BASE _AC(0xffff880000000000, UL) +#ifdef CONFIG_RANDOMIZE_MEMORY +#define __XEN_SPACE _AC(0x80000000000, UL) +#define __PAGE_OFFSET page_offset_base +#else +#define __PAGE_OFFSET __PAGE_OFFSET_BASE +#endif /* CONFIG_RANDOMIZE_MEMORY */ #define __START_KERNEL_map _AC(0xffffffff80000000, UL) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 2ee7811..0dfec89 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512]; extern pmd_t level2_ident_pgt[512]; extern pte_t level1_fixmap_pgt[512]; extern pgd_t init_level4_pgt[]; +extern pgd_t trampoline_pgd_entry; #define swapper_pg_dir init_level4_pgt diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index e6844df..d388739 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -5,6 +5,7 @@ #ifndef __ASSEMBLY__ #include <linux/types.h> +#include <asm/kaslr.h> /* * These are used to make use of C type-checking.. @@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t; /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */ #define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) -#define VMALLOC_START _AC(0xffffc90000000000, UL) -#define VMALLOC_END _AC(0xffffe8ffffffffff, UL) -#define VMEMMAP_START _AC(0xffffea0000000000, UL) +#define VMALLOC_SIZE_TB _AC(32, UL) +#define __VMALLOC_BASE _AC(0xffffc90000000000, UL) +#define __VMEMMAP_BASE _AC(0xffffea0000000000, UL) +#ifdef CONFIG_RANDOMIZE_MEMORY +#define VMALLOC_START vmalloc_base +#define VMEMMAP_START vmemmap_base +#else +#define VMALLOC_START __VMALLOC_BASE +#define VMEMMAP_START __VMEMMAP_BASE +#endif /* CONFIG_RANDOMIZE_MEMORY */ +#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL)) #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) #define MODULES_END _AC(0xffffffffff000000, UL) #define MODULES_LEN (MODULES_END - MODULES_VADDR) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 22fbf9d..b282db4 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -37,7 +37,7 @@ #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) -L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET) +L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) L4_START_KERNEL = pgd_index(__START_KERNEL_map) L3_START_KERNEL = pud_index(__START_KERNEL_map) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 319b08a..aebfa1d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -909,6 +909,8 @@ void __init setup_arch(char **cmdline_p) x86_init.oem.arch_setup(); + kernel_randomize_memory(); + iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1; setup_memory_map(); parse_setup_data(); diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index f989132..2c24dd6 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -38,4 +38,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MPX) += mpx.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 99bfb19..4a03f60 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = { { 0, "User Space" }, #ifdef CONFIG_X86_64 { 0x8000000000000000UL, "Kernel Space" }, - { PAGE_OFFSET, "Low Kernel Mapping" }, - { VMALLOC_START, "vmalloc() Area" }, - { VMEMMAP_START, "Vmemmap" }, + { 0/* PAGE_OFFSET */, "Low Kernel Mapping" }, + { 0/* VMALLOC_START */, "vmalloc() Area" }, + { 0/* VMEMMAP_START */, "Vmemmap" }, # ifdef CONFIG_X86_ESPFIX64 { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, # endif @@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void) static int __init pt_dump_init(void) { +#ifdef CONFIG_X86_64 + address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET; + address_markers[VMALLOC_START_NR].start_address = VMALLOC_START; + address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START; +#endif #ifdef CONFIG_X86_32 /* Not a compile-time constant on x86-32 */ address_markers[VMALLOC_START_NR].start_address = VMALLOC_START; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 6adfbce..32c5558 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -633,6 +633,9 @@ kernel_physical_mapping_init(unsigned long start, pgd_changed = true; } + if (addr == PAGE_OFFSET) + kaslr_trampoline_init(page_size_mask); + if (pgd_changed) sync_global_pgds(addr, end - 1, 0); diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c new file mode 100644 index 0000000..9de807d --- /dev/null +++ b/arch/x86/mm/kaslr.c @@ -0,0 +1,151 @@ +#include <linux/kernel.h> +#include <linux/errno.h> +#include <linux/types.h> +#include <linux/mm.h> +#include <linux/smp.h> +#include <linux/init.h> +#include <linux/memory.h> +#include <linux/random.h> +#include <xen/xen.h> + +#include <asm/processor.h> +#include <asm/pgtable.h> +#include <asm/pgalloc.h> +#include <asm/e820.h> +#include <asm/init.h> +#include <asm/setup.h> +#include <asm/kaslr.h> +#include <asm/kasan.h> + +#include "mm_internal.h" + +/* Hold the pgd entry used on booting additional CPUs */ +pgd_t trampoline_pgd_entry; + +static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE; + +#if defined(CONFIG_KASAN) +static const unsigned long memory_rand_end = KASAN_SHADOW_START; +#elfif defined(CONFIG_X86_ESPFIX64) +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR; +#elfif defined(CONFIG_EFI) +static const unsigned long memory_rand_end = EFI_VA_START; +#else +static const unsigned long memory_rand_end = __START_KERNEL_map; +#endif + +/* Default values */ +unsigned long page_offset_base = __PAGE_OFFSET_BASE; +EXPORT_SYMBOL(page_offset_base); +unsigned long vmalloc_base = __VMALLOC_BASE; +EXPORT_SYMBOL(vmalloc_base); +unsigned long vmemmap_base = __VMEMMAP_BASE; +EXPORT_SYMBOL(vmemmap_base); + +static struct kaslr_memory_region { + unsigned long *base; + unsigned short size_tb; +} kaslr_regions[] = { + { &page_offset_base, 64/* Maximum */ }, + { &vmalloc_base, VMALLOC_SIZE_TB }, + { &vmemmap_base, 1 }, +}; + +#define TB_SHIFT 40 + +/* Size in Terabytes + 1 hole */ +static inline unsigned long get_padding(struct kaslr_memory_region *region) +{ + return ((unsigned long)region->size_tb + 1) << TB_SHIFT; +} + +void __init kernel_randomize_memory(void) +{ + size_t i; + unsigned long addr = memory_rand_start; + unsigned long padding, rand, mem_tb; + struct rnd_state rnd_st; + unsigned long remain_padding = memory_rand_end - memory_rand_start; + + if (!kaslr_enabled()) + return; + + /* Take the additional space when Xen is not active. */ + if (!xen_domain()) + page_offset_base -= __XEN_SPACE; + + BUG_ON(kaslr_regions[0].base != &page_offset_base); + mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT); + + if (mem_tb < kaslr_regions[0].size_tb) + kaslr_regions[0].size_tb = mem_tb; + + for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) + remain_padding -= get_padding(&kaslr_regions[i]); + + prandom_seed_state(&rnd_st, kaslr_get_random_boot_long()); + + /* Position each section randomly with minimum 1 terabyte between */ + for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) { + padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i); + prandom_bytes_state(&rnd_st, &rand, sizeof(rand)); + padding = (rand % (padding + 1)) & PUD_MASK; + addr += padding; + *kaslr_regions[i].base = addr; + addr += get_padding(&kaslr_regions[i]); + remain_padding -= padding; + } +} + +/* + * Create PGD aligned trampoline table to allow real mode initialization + * of additional CPUs. Consume only 1 additonal low memory page. + */ +void __meminit kaslr_trampoline_init(unsigned long page_size_mask) +{ + unsigned long addr, next, end; + pgd_t *pgd; + pud_t *pud_page, *tr_pud_page; + int i; + + if (!kaslr_enabled()) { + trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)]; + return; + } + + tr_pud_page = alloc_low_page(); + set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page))); + + addr = 0; + end = ISA_END_ADDRESS; + pgd = pgd_offset_k((unsigned long)__va(addr)); + pud_page = (pud_t *) pgd_page_vaddr(*pgd); + + for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) { + pud_t *pud, *tr_pud; + pmd_t *pmd; + + tr_pud = tr_pud_page + pud_index(addr); + pud = pud_page + pud_index((unsigned long)__va(addr)); + next = (addr & PUD_MASK) + PUD_SIZE; + + if (addr >= end || !pud_val(*pud)) { + if (!after_bootmem && + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) && + !e820_any_mapped(addr & PUD_MASK, next, + E820_RESERVED_KERN)) + set_pud(tr_pud, __pud(0)); + continue; + } + + if (page_size_mask & (1<<PG_LEVEL_1G)) { + set_pte((pte_t *)tr_pud, + pfn_pte((__pa(addr) & PUD_MASK) >> PAGE_SHIFT, + PAGE_KERNEL_LARGE)); + continue; + } + + pmd = pmd_offset(pud, 0); + set_pud(tr_pud, __pud(_PAGE_TABLE | __pa(pmd))); + } +} diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c index 0b7a63d..44a7546 100644 --- a/arch/x86/realmode/init.c +++ b/arch/x86/realmode/init.c @@ -22,6 +22,7 @@ void __init reserve_real_mode(void) base = __va(mem); memblock_reserve(mem, size); real_mode_header = (struct real_mode_header *) base; + /* Don't disclose memory trampoline with KASLR memory enabled */ printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n", base, (unsigned long long)mem, size); } @@ -84,7 +85,11 @@ void __init setup_real_mode(void) *trampoline_cr4_features = __read_cr4(); trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd); +#ifdef CONFIG_RANDOMIZE_MEMORY + trampoline_pgd[0] = trampoline_pgd_entry.pgd; +#else trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd; +#endif trampoline_pgd[511] = init_level4_pgt[511].pgd; #endif } -- 2.8.0.rc3.226.g39d4020
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.