|
Message-ID: <20110825171934.GA3044@albatros> Date: Thu, 25 Aug 2011 21:19:34 +0400 From: Vasiliy Kulikov <segoon@...nwall.com> To: kernel-hardening@...ts.openwall.com Subject: Re: [RFC] x86, mm: start mmap allocation for libs from low addresses Solar, I hope I've addressed Peter's complains in the following patch. However, it would be great if you could (re)check the motivation and the comments. I tried not to make previous mistakes and worded the statements more carefully, but an additional review won't hurt :) Thanks! ------------------------------ From: Vasiliy Kulikov <segoon@...nwall.com> This patch changes mmap base address allocator logic to incline to allocate addresses for executable pages from the first 16 Mbs of address space. These addresses start from zero byte (0x00AABBCC). Using such addresses breaks ret2libc exploits abusing string buffer overflows (or does it much harder). As x86 architecture is little-endian, this zero byte is the last byte of the address. So it's possible to e.g. overwrite a return address on the stack with the mailformed address. However, now it's impossible to additionally overwrite function arguments, which are located after the function address on the stack. The attacker's best bet may be to find an entry point not at function boundary that sets registers and then proceeds with or branches to the desired library code. The easiest way to set registers and branch would be a function epilogue - pop/pop/.../ret - but then there's the difficulty in passing the address to ret to (we have just one NUL and we've already used it to get to this code). Similarly, even via such pop's we can't pass an argument that contains a NUL in it - e.g., the address of "/bin/sh" in libc (it contains a NUL most significant byte too) or a zero value for root's uid. A possible bypass is via multiple overflows - if the overflow may be triggered more than once before the vulnerable function returns, then multiple NULs may be written, exactly one per overflow. But this is hopefully relatively rare. To fully utilize the protection, the executable image should be randomized (sysctl kernel.randomize_va_space > 0 and the executable is compiled as PIE) and the sum of libraries sizes plus executable size shouldn't exceed 16 Mb. In this case the only pages out of ASCII-protected range are VDSO and vsyscall. However, they don't provide enough material for obtaining arbitrary code execution and are not dangerous without using other executable pages. The logic is applied to x86 32 bit tasks, both for 32 bit kernels and for 32 bit tasks running on 64 bit kernels. 64 bit tasks already have zero bytes in addresses of library functions. Other architectures (non-x86) may reuse the logic too. Without the patch: $ ldd /bin/ls linux-gate.so.1 => (0xf779c000) librt.so.1 => /lib/librt.so.1 (0xb7fcf000) libtermcap.so.2 => /lib/libtermcap.so.2 (0xb7fca000) libc.so.6 => /lib/libc.so.6 (0xb7eae000) libpthread.so.0 => /lib/libpthread.so.0 (0xb7e5b000) /lib/ld-linux.so.2 (0xb7fe6000) With the patch: $ ldd /bin/ls linux-gate.so.1 => (0xf772a000) librt.so.1 => /lib/librt.so.1 (0x0004a000) libtermcap.so.2 => /lib/libtermcap.so.2 (0x0005e000) libc.so.6 => /lib/libc.so.6 (0x00062000) libpthread.so.0 => /lib/libpthread.so.0 (0x00183000) /lib/ld-linux.so.2 (0x00121000) If CONFIG_VM86=y, the first megabyte is excluded from the potential range for mmap allocations as it might be used by vm86 code. If CONFIG_VM86=n, the allocation begins from the mmap_min_addr. Regardless of CONFIG_VM86 the base address is randomized with the same entropy size as mm->mmap_base. If 16 Mbs are over, we fallback to the old allocation algorithm. But, hopefully, programs which need such protection (network daemons, programs working with untrusted data, etc.) are small enough to utilize the protection. The same logic was used in -ow patch for 2.0-2.4 kernels and in exec-shield for 2.6.x kernels. Code parts were taken from exec-shield from RHEL6. v2 - Added comments, adjusted patch description. - s/arch_get_unmapped_exec_area/get_unmapped_exec_area/ - Don't reserve first Mb if CONFIG_VM86=n. Signed-off-by: Vasiliy Kulikov <segoon@...nwall.com> -- arch/x86/mm/mmap.c | 20 +++++++++++ include/linux/mm_types.h | 4 ++ include/linux/sched.h | 3 ++ mm/mmap.c | 82 +++++++++++++++++++++++++++++++++++++++++++--- 4 files changed, 104 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c index 1dab519..4e7a783 100644 --- a/arch/x86/mm/mmap.c +++ b/arch/x86/mm/mmap.c @@ -118,6 +118,22 @@ static unsigned long mmap_legacy_base(void) return TASK_UNMAPPED_BASE + mmap_rnd(); } +#ifdef CONFIG_VM86 +/* + * Don't touch any memory that can be used by vm86 apps. + * Reserve the first Mb + 64 kb of guard pages. + */ +#define ASCII_ARMOR_MIN_ADDR (0x00100000 + 0x00010000) +#else +/* No special users of low addresses. Start just after mmap_min_addr. */ +#define ASCII_ARMOR_MIN_ADDR 0 +#endif + +static unsigned long mmap_lib_base(void) +{ + return ASCII_ARMOR_MIN_ADDR + mmap_rnd(); +} + /* * This function, called very early during the creation of a new * process VM image, sets up which VM layout function to use: @@ -131,6 +147,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm) } else { mm->mmap_base = mmap_base(); mm->get_unmapped_area = arch_get_unmapped_area_topdown; + if (mmap_is_ia32()) { + mm->get_unmapped_exec_area = get_unmapped_exec_area; + mm->lib_mmap_base = mmap_lib_base(); + } mm->unmap_area = arch_unmap_area_topdown; } } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 027935c..68fc216 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -225,9 +225,13 @@ struct mm_struct { unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); + unsigned long (*get_unmapped_exec_area) (struct file *filp, + unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags); void (*unmap_area) (struct mm_struct *mm, unsigned long addr); #endif unsigned long mmap_base; /* base of mmap area */ + unsigned long lib_mmap_base; /* base of mmap libraries area (for get_unmapped_exec_area()) */ unsigned long task_size; /* size of task vm space */ unsigned long cached_hole_size; /* if non-zero, the largest hole below free_area_cache */ unsigned long free_area_cache; /* first hole of size cached_hole_size or larger */ diff --git a/include/linux/sched.h b/include/linux/sched.h index f024c63..ef9024f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -394,6 +394,9 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, unsigned long flags); extern void arch_unmap_area(struct mm_struct *, unsigned long); extern void arch_unmap_area_topdown(struct mm_struct *, unsigned long); +extern unsigned long +get_unmapped_exec_area(struct file *, unsigned long, + unsigned long, unsigned long, unsigned long); #else static inline void arch_pick_mmap_layout(struct mm_struct *mm) {} #endif diff --git a/mm/mmap.c b/mm/mmap.c index d49736f..2828322 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -50,6 +50,10 @@ static void unmap_region(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, unsigned long start, unsigned long end); +static unsigned long +get_unmapped_area_prot(struct file *file, unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags, bool exec); + /* * WARNING: the debugging will use recursive algorithms so never enable this * unless you know what you are doing. @@ -989,7 +993,8 @@ unsigned long do_mmap_pgoff(struct file *file, unsigned long addr, /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ - addr = get_unmapped_area(file, addr, len, pgoff, flags); + addr = get_unmapped_area_prot(file, addr, len, pgoff, flags, + prot & PROT_EXEC); if (addr & ~PAGE_MASK) return addr; @@ -1528,6 +1533,62 @@ bottomup: } #endif +/* Addresses before this value contain at least one zero byte. */ +#define ASCII_ARMOR_MAX_ADDR 0x01000000 + +/* + * This function finds the first unmapped region inside of + * [mm->lib_mmap_base; ASCII_ARMOR_MAX_ADDR) region. Addresses from this + * region contain at least one zero byte, which greatly complicates + * exploitation of C string buffer overflows (C strings cannot contain zero + * byte inside) in return to libc class of attacks. + * + * This allocator is bottom up allocator like arch_get_unmapped_area(), but + * it differs from the latter. get_unmapped_exec_area() does its best to + * allocate as low address as possible. + */ +unsigned long +get_unmapped_exec_area(struct file *filp, unsigned long addr0, + unsigned long len, unsigned long pgoff, unsigned long flags) +{ + unsigned long addr = addr0; + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + + if (len > TASK_SIZE) + return -ENOMEM; + + if (flags & MAP_FIXED) + return addr; + + /* We ALWAYS start from the beginning as base addresses + * with zero high bits is a valued resource */ + addr = max_t(unsigned long, mm->lib_mmap_base, mmap_min_addr); + + for (vma = find_vma(mm, addr); ; vma = vma->vm_next) { + /* At this point: (!vma || addr < vma->vm_end). */ + if (TASK_SIZE - len < addr) + return -ENOMEM; + + /* We don't want to reduce brk area of not DYNAMIC elf binaries + * with sysctl kernel.randomize_va_space < 2. */ + if (mm->brk && addr > mm->brk) + goto failed; + + if (!vma || addr + len <= vma->vm_start) + return addr; + + addr = vma->vm_end; + + /* If ACSII-armor area is over, the algo gives up */ + if (addr >= ASCII_ARMOR_MAX_ADDR) + goto failed; + } + +failed: + return current->mm->get_unmapped_area(filp, addr0, len, pgoff, flags); +} + void arch_unmap_area_topdown(struct mm_struct *mm, unsigned long addr) { /* @@ -1541,9 +1602,9 @@ void arch_unmap_area_topdown(struct mm_struct *mm, unsigned long addr) mm->free_area_cache = mm->mmap_base; } -unsigned long -get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, - unsigned long pgoff, unsigned long flags) +static unsigned long +get_unmapped_area_prot(struct file *file, unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags, bool exec) { unsigned long (*get_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); @@ -1556,7 +1617,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, if (len > TASK_SIZE) return -ENOMEM; - get_area = current->mm->get_unmapped_area; + if (exec && current->mm->get_unmapped_exec_area) + get_area = current->mm->get_unmapped_exec_area; + else + get_area = current->mm->get_unmapped_area; + if (file && file->f_op && file->f_op->get_unmapped_area) get_area = file->f_op->get_unmapped_area; addr = get_area(file, addr, len, pgoff, flags); @@ -1571,6 +1636,13 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, return arch_rebalance_pgtables(addr, len); } +unsigned long +get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + return get_unmapped_area_prot(file, addr, len, pgoff, flags, false); +} + EXPORT_SYMBOL(get_unmapped_area); /* Look up the first VMA which satisfies addr < vm_end, NULL if none. */ -- Vasiliy
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.