|
Message-ID: <20180626191343.GA12474@openwall.com> Date: Tue, 26 Jun 2018 21:13:43 +0200 From: Solar Designer <solar@...nwall.com> To: owl-dev@...ts.openwall.com Cc: Vasily Averin <vvs@...tuozzo.com> Subject: Re: 32-bit syscall breakage in -431 kernel with KAISER On Tue, Jun 26, 2018 at 08:00:49PM +0200, Pavel Kankovsky wrote: > It turns out the handler is invoked with an unexpected stack frame in the > "trampoline stack" (the top of the "trampoline stack", as stored in > TSS.rsp0, is 0xffff810001a36278 in this example): > > (gdb) x/6g $rsp > 0xffff810001a36248: 0x00000000b7f81ea1 0x0000000000000073 > 0xffff810001a36258: 0x0000000000000246 0x00000000bfa78274 > 0xffff810001a36268: 0x000000000000007b 0x000000000000007b > > There should be five entries (rip at the bottom, cs, eflags, rsp, and ss > at the top) only but the actual stack contains one mysterious and bogus > extra entry (the other 0x000000000000007b) at the top, shifting everything > else down by 8 bytes. > > This stack frame (including the bogus extra entry) is copied to the > regular kernel stack. As far as I can tell, the same extra entry appears > in other interrupt handlers, e.g. common_interrupt. > > Functions that get an explicit pointer to struct pt_regs are not affected > but anything that expects to find pt_regs at the top of the stack (e.g. > compat_alloc_user_space via task_pt_regs(current)) breaks. > > It seems the extra entry is added because the top of the trampoline stack > is not aligned to 16 bytes and the CPU does not like it and enforces the > alignment, shifting the whole stack down wrt. the expected layout as a > result. AMD64 Architecture Programmer's Manual, Volume 2, Section 8.9 mentions 16-byte alignment: "Interrupt-Stack Alignment. In legacy mode, the interrupt-stack pointer can be aligned at any address boundary. Long mode, however, aligns the stack on a 16-byte boundary. This alignment is performed by the processor in hardware before pushing items onto the stack frame. The previous RSP is saved unconditionally on the new stack by the interrupt mechanism. A subsequent IRET instruction automatically restores the previous RSP." "Although the RSP alignment is always performed in long mode, it is only of consequence when the interrupted program is already running at CPL=0, and it is generally used only within the operating-system kernel. The operating system should put 16-byte aligned RSP values in the TSS for interrupts that change privilege levels." > I have modified struct tss_struct in include/asm-x86_64/processor.h to > include an extra "unsigned long stack_padding" between stack_canary and > stack to make stack 16-byte aligned... > > > ---snip--- > --- include/asm-x86_64/processor.h.orig 2018-06-26 12:11:17 +0000 > +++ include/asm-x86_64/processor.h 2018-06-26 16:50:55 +0000 > @@ -269,6 +269,7 @@ > */ > #ifndef __GENKSYMS__ > unsigned long stack_canary; > + unsigned long stack_padding; > unsigned long stack[64]; > #endif > } __attribute__((packed, __aligned__(PAGE_SIZE))); > ---snip--- > > > ...and lo! 32-bit ifconfig works as expected. Wow. But per my review of the full struct tss_struct, the stack[] field offset is: 4+8*5+4*2+2*2+1025*8+8 = 8264 So it's not 16-byte aligned. But that's the start of stack[], and we need the stack pointer to be aligned. Yet I suppose aligning stack[] itself is in fact the most appropriate fix. Reviewing the diffs between e.g. -423 and -431, I see this stack[] in struct tss is in fact newly added: @@ -245,7 +248,29 @@ * 8 bytes, for an extra "long" of ~0UL */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; -} __attribute__((packed)) ____cacheline_aligned; + /* + * + * The Intel SDM says (Volume 3, 7.2.1): + * + * Avoid placing a page boundary in the part of the TSS that the + * processor reads during a task switch (the first 104 bytes). The + * processor may not correctly perform address translations if a + * boundary occurs in this area. During a task switch, the processor + * reads and writes into the first 104 bytes of each TSS (using + * contiguous physical addresses beginning with the physical address + * of the first byte of the TSS). So, after TSS access begins, if + * part of the 104 bytes is not physically contiguous, the processor + * will access incorrect information without generating a page-fault + * exception. + * + * There are also a lot of errata involving the TSS spanning a page + * boundary. Assert that we're not doing that. + */ +#ifndef __GENKSYMS__ + unsigned long stack_canary; + unsigned long stack[64]; +#endif +} __attribute__((packed, __aligned__(PAGE_SIZE))); Do you think this is a RHEL5 bug? Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.