|
Message-ID: <c6603960-c155-8f9f-6458-38e9ba6d4bdd@marcan.st> Date: Fri, 10 Nov 2017 19:40:30 +0900 From: Hector Martin 'marcan' <marcan@...can.st> To: luto@...capital.net Cc: LKML <linux-kernel@...r.kernel.org>, "kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>, x86@...nel.org Subject: vDSO maximum stack usage, stack probes, and -fstack-check As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and `man 7 vdso`) make no mention of how much stack the vDSO functions are allowed to use. They just say "the usual C ABI", which makes no guarantees. It turns out that Go has been assuming that those functions use less than 104 bytes of stack space, because it calls them directly on its tiny stack allocations with no guard pages or other hardware overflow protection [1]. On most systems, this is fine. However, on my system the stars aligned and turned it into a nondeterministic crash. I use Gentoo Hardened, which builds its toolchain with -fstack-check on by default. It turns out that with the combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's not marked inline, so it's perfectly within its right not to do that, though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y even though that nominally gives it greater freedom *not* to inline things marked inline). That turns __vdso_clock_gettime and __vdso_gettimeofday into non-leaf functions, and GCC then inserts a stack probe (full objdump at [2]): 0000000000000030 <__vdso_clock_gettime>: 30: 55 push %rbp 31: 48 89 e5 mov %rsp,%rbp 34: 48 81 ec 20 10 00 00 sub $0x1020,%rsp 3b: 48 83 0c 24 00 orq $0x0,(%rsp) 40: 48 81 c4 20 10 00 00 add $0x1020,%rsp That silently overflows the Go stack. "orq 0" does nothing as long as the page is mapped, but it's not atomic. It turns out that sometimes (pretty often on my box) that races another thread accessing the same location and corrupts memory. The stack probe sounds unnecessary, since it only calls vread_tsc and that can't ever skip over more than a page of stack. In fact I don't even know why it does the probe; I thought the point of stack probes was to poke the stack on allocations >4K to ensure the guard page isn't skipped, but none of these functions use more than a few bytes of stack space. Nonetheless, none of this is wrong per se; the current vDSO spec makes no guarantees about stack usage. The question is, should it? Should the vDSO spec set a hard limit on stack consumption that userspace can rely on, and perhaps inline everything and/or disable -fstack-check to avoid the stack probes? [1] https://github.com/golang/go/issues/20427#issuecomment-343255844 [2] https://marcan.st/paste/HCVuLG6T.txt -- Hector Martin "marcan" (marcan@...can.st) Public Key: https://mrcn.st/pub
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.