|
Message-ID: <CAGXu5jL9vUrn4kpjO+qa4cHmWBypeqP17OGbrMs=5Nz0YpQMZw@mail.gmail.com> Date: Fri, 12 May 2017 12:01:59 -0700 From: Kees Cook <keescook@...omium.org> To: Martin Schwidefsky <schwidefsky@...ibm.com> Cc: Linus Torvalds <torvalds@...ux-foundation.org>, Thomas Garnier <thgarnie@...gle.com>, Greg KH <greg@...ah.com>, Ingo Molnar <mingo@...nel.org>, Daniel Micay <danielmicay@...il.com>, Heiko Carstens <heiko.carstens@...ibm.com>, Dave Hansen <dave.hansen@...el.com>, Arnd Bergmann <arnd@...db.de>, Thomas Gleixner <tglx@...utronix.de>, David Howells <dhowells@...hat.com>, René Nyffenegger <mail@...enyffenegger.ch>, Andrew Morton <akpm@...ux-foundation.org>, "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>, "Eric W . Biederman" <ebiederm@...ssion.com>, Oleg Nesterov <oleg@...hat.com>, Pavel Tikhomirov <ptikhomirov@...tuozzo.com>, Ingo Molnar <mingo@...hat.com>, "H . Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>, Rik van Riel <riel@...hat.com>, Josh Poimboeuf <jpoimboe@...hat.com>, Borislav Petkov <bp@...en8.de>, Brian Gerst <brgerst@...il.com>, "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>, Christian Borntraeger <borntraeger@...ibm.com>, Russell King <linux@...linux.org.uk>, Will Deacon <will.deacon@....com>, Catalin Marinas <catalin.marinas@....com>, Mark Rutland <mark.rutland@....com>, James Morse <james.morse@....com>, linux-s390 <linux-s390@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, Linux API <linux-api@...r.kernel.org>, "the arch/x86 maintainers" <x86@...nel.org>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Peter Zijlstra <a.p.zijlstra@...llo.nl>, Al Viro <viro@...iv.linux.org.uk> Subject: Re: Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode On Thu, May 11, 2017 at 10:54 PM, Martin Schwidefsky <schwidefsky@...ibm.com> wrote: > On Thu, 11 May 2017 22:34:31 -0700 > Kees Cook <keescook@...omium.org> wrote: > >> On Thu, May 11, 2017 at 10:28 PM, Martin Schwidefsky >> <schwidefsky@...ibm.com> wrote: >> > On Thu, 11 May 2017 16:44:07 -0700 >> > Linus Torvalds <torvalds@...ux-foundation.org> wrote: >> > >> >> On Thu, May 11, 2017 at 4:17 PM, Thomas Garnier <thgarnie@...gle.com> wrote: >> >> > >> >> > Ingo: Do you want the change as-is? Would you like it to be optional? >> >> > What do you think? >> >> >> >> I'm not ingo, but I don't like that patch. It's in the wrong place - >> >> that system call return code is too timing-critical to add address >> >> limit checks. >> >> >> >> Now what I think you *could* do is: >> >> >> >> - make "set_fs()" actually set a work flag in the current thread flags >> >> >> >> - do the test in the slow-path (syscall_return_slowpath). >> >> >> >> Yes, yes, that ends up being architecture-specific, but it's fairly simple. >> >> >> >> And it only slows down the system calls that actually use "set_fs()". >> >> Sure, it will slow those down a fair amount, but they are hopefully a >> >> small subset of all cases. >> >> >> >> How does that sound to people? Thats' where we currently do that >> >> >> >> if (IS_ENABLED(CONFIG_PROVE_LOCKING) && >> >> WARN(irqs_disabled(), "syscall %ld left IRQs disabled", >> >> regs->orig_ax)) >> >> local_irq_enable(); >> >> >> >> check too, which is a fairly similar issue. >> > >> > This is exactly what Heiko did for the s390 backend as a result of this >> > discussion. See the _CIF_ASCE_SECONDARY bit in arch/s390/kernel/entry.S, >> > for the hot patch the check for the bit is included in the general >> > _CIF_WORK test. Only the slow patch gets a bit slower. >> > >> > git commit b5a882fcf146c87cb6b67c6df353e1c042b8773d >> > "s390: restore address space when returning to user space". >> >> If I'm understanding this, it won't catch corruption of addr_limit >> during fast-path syscalls, though (i.e. addr_limit changed without a >> call to set_fs()). :( This addr_limit corruption is mostly only a risk >> archs without THREAD_INFO_IN_TASK, but it would still be nice to catch >> unbalanced set_fs() code, so I like the idea. I like getting rid of >> addr_limit entirely even more, but that'll take some time. :) > > Well for s390 there is no addr_limit as we use two separate address space > for kernel vs. user. The equivalent to the addr_limit corruption on a > fast-path syscall would be changing CR7 outside of set_fs. This boils > down to the question what we are protection against? Bad code with > unbalanced set_fs or evil code that changes addr_limit/CR7 outside of > set_fs Yeah, the risk for "corrupted addr_limit" is mainly a concern for archs with addr_limit on the kernel stack. If I'm reading things correctly, that means, from the archs I've been paying closer attention to, it's an issue for arm, mips, and powerpc: arch/arm/include/asm/uaccess.h: current_thread_info()->addr_limit = fs; arch/arm/include/asm/thread_info.h: (current_stack_pointer & ~(THREAD_SIZE - 1)); arch/mips/include/asm/uaccess.h:#define set_fs(x) (current_thread_info()->addr_limit = (x)) arch/mips/kernel/process.c: * task stacks at THREAD_SIZE - 32 arch/powerpc/include/asm/uaccess.h:#define set_fs(val) (current->thread.fs = (val)) arch/powerpc/kernel/process.c: struct pt_regs *regs = task_stack_page(current) + THREAD_SIZE; (s390 uses a register, x86 and arm64 implement THREAD_INFO_IN_TASK.) Targeting addr_limit through arbitrary write attacks isn't too common since ... it's an arbitrary write. The issue with addr_limit was that it can live on the kernel stack, which meant all kinds of stack-related bugs can lead to it getting stomped on. So, two goals to protect addr_limit: - get it off the stack to make the difficulty of corruption on par with other sensitive things that would require an arbitrary write flaw. - detect/block unbalanced set_fs() calls. If we can get the former addressed by the remaining architectures, then that class of attack will go away. For the latter, it sounds like Linus's slowpath-exit will work nicely. To me it looks like he architectures with addr_limit still on the stack would still benefit from always-check-addr_limit on syscall exit, but that would be arch-specific anyway. And then, of course, we've got the parallel task of just removing set_fs() entirely. :) -Kees -- Kees Cook Pixel Security
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.