|
Message-ID: <22b1db07-aa95-ff84-cef0-708f6f968c45@virtuozzo.com> Date: Thu, 28 Jun 2018 16:08:29 +0300 From: Vasily Averin <vvs@...tuozzo.com> To: Solar Designer <solar@...nwall.com>, owl-dev@...ts.openwall.com Subject: Re: 32-bit syscall breakage in -431 kernel with KAISER On 06/26/2018 10:13 PM, Solar Designer wrote: > On Tue, Jun 26, 2018 at 08:00:49PM +0200, Pavel Kankovsky wrote: >> It turns out the handler is invoked with an unexpected stack frame in the >> "trampoline stack" (the top of the "trampoline stack", as stored in >> TSS.rsp0, is 0xffff810001a36278 in this example): >> >> (gdb) x/6g $rsp >> 0xffff810001a36248: 0x00000000b7f81ea1 0x0000000000000073 >> 0xffff810001a36258: 0x0000000000000246 0x00000000bfa78274 >> 0xffff810001a36268: 0x000000000000007b 0x000000000000007b >> >> There should be five entries (rip at the bottom, cs, eflags, rsp, and ss >> at the top) only but the actual stack contains one mysterious and bogus >> extra entry (the other 0x000000000000007b) at the top, shifting everything >> else down by 8 bytes. >> >> This stack frame (including the bogus extra entry) is copied to the >> regular kernel stack. As far as I can tell, the same extra entry appears >> in other interrupt handlers, e.g. common_interrupt. >> >> Functions that get an explicit pointer to struct pt_regs are not affected >> but anything that expects to find pt_regs at the top of the stack (e.g. >> compat_alloc_user_space via task_pt_regs(current)) breaks. >> >> It seems the extra entry is added because the top of the trampoline stack >> is not aligned to 16 bytes and the CPU does not like it and enforces the >> alignment, shifting the whole stack down wrt. the expected layout as a >> result. > > AMD64 Architecture Programmer's Manual, Volume 2, Section 8.9 mentions > 16-byte alignment: > > "Interrupt-Stack Alignment. In legacy mode, the interrupt-stack pointer > can be aligned at any address boundary. Long mode, however, aligns the > stack on a 16-byte boundary. This alignment is performed by the > processor in hardware before pushing items onto the stack frame. The > previous RSP is saved unconditionally on the new stack by the interrupt > mechanism. A subsequent IRET instruction automatically restores the > previous RSP." > > "Although the RSP alignment is always performed in long mode, it is only > of consequence when the interrupted program is already running at CPL=0, > and it is generally used only within the operating-system kernel. The > operating system should put 16-byte aligned RSP values in the TSS for > interrupts that change privilege levels." > >> I have modified struct tss_struct in include/asm-x86_64/processor.h to >> include an extra "unsigned long stack_padding" between stack_canary and >> stack to make stack 16-byte aligned... >> >> >> ---snip--- >> --- include/asm-x86_64/processor.h.orig 2018-06-26 12:11:17 +0000 >> +++ include/asm-x86_64/processor.h 2018-06-26 16:50:55 +0000 >> @@ -269,6 +269,7 @@ >> */ >> #ifndef __GENKSYMS__ >> unsigned long stack_canary; >> + unsigned long stack_padding; >> unsigned long stack[64]; >> #endif >> } __attribute__((packed, __aligned__(PAGE_SIZE))); >> ---snip--- >> >> >> ...and lo! 32-bit ifconfig works as expected. > > Wow. But per my review of the full struct tss_struct, the stack[] field > offset is: > > 4+8*5+4*2+2*2+1025*8+8 = 8264 Alexander, seems you're wrong in my version of rhel5-based -123.1 kernel crash> tss_struct -o struct tss_struct { [0x0] u32 reserved1; [0x4] u64 rsp0; [0xc] u64 rsp1; [0x14] u64 rsp2; [0x1c] u64 reserved2; [0x24] u64 ist[7]; [0x5c] u32 reserved3; [0x60] u32 reserved4; [0x64] u16 reserved5; [0x66] u16 io_bitmap_base; [0x68] unsigned long io_bitmap[1025]; [0x2070] unsigned long stack_canary; [0x2078] unsigned long stack[64]; } SIZE: 0x3000 crash> tss_struct -od struct tss_struct { [0] u32 reserved1; [4] u64 rsp0; [12] u64 rsp1; [20] u64 rsp2; [28] u64 reserved2; [36] u64 ist[7]; [92] u32 reserved3; [96] u32 reserved4; [100] u16 reserved5; [102] u16 io_bitmap_base; [104] unsigned long io_bitmap[1025]; [8304] unsigned long stack_canary; [8312] unsigned long stack[64]; } SIZE: 12288 Seems you missed that 'ist' filed is an array > > So it's not 16-byte aligned. But that's the start of stack[], and we > need the stack pointer to be aligned. Yet I suppose aligning stack[] > itself is in fact the most appropriate fix. > > Reviewing the diffs between e.g. -423 and -431, I see this stack[] in > struct tss is in fact newly added: > > @@ -245,7 +248,29 @@ > * 8 bytes, for an extra "long" of ~0UL > */ > unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; > -} __attribute__((packed)) ____cacheline_aligned; > + /* > + * > + * The Intel SDM says (Volume 3, 7.2.1): > + * > + * Avoid placing a page boundary in the part of the TSS that the > + * processor reads during a task switch (the first 104 bytes). The > + * processor may not correctly perform address translations if a > + * boundary occurs in this area. During a task switch, the processor > + * reads and writes into the first 104 bytes of each TSS (using > + * contiguous physical addresses beginning with the physical address > + * of the first byte of the TSS). So, after TSS access begins, if > + * part of the 104 bytes is not physically contiguous, the processor > + * will access incorrect information without generating a page-fault > + * exception. > + * > + * There are also a lot of errata involving the TSS spanning a page > + * boundary. Assert that we're not doing that. > + */ > +#ifndef __GENKSYMS__ > + unsigned long stack_canary; > + unsigned long stack[64]; > +#endif > +} __attribute__((packed, __aligned__(PAGE_SIZE))); > > Do you think this is a RHEL5 bug? > > Thanks, > > Alexander >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.