Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22b1db07-aa95-ff84-cef0-708f6f968c45@virtuozzo.com>
Date: Thu, 28 Jun 2018 16:08:29 +0300
From: Vasily Averin <vvs@...tuozzo.com>
To: Solar Designer <solar@...nwall.com>, owl-dev@...ts.openwall.com
Subject: Re: 32-bit syscall breakage in -431 kernel with KAISER



On 06/26/2018 10:13 PM, Solar Designer wrote:
> On Tue, Jun 26, 2018 at 08:00:49PM +0200, Pavel Kankovsky wrote:
>> It turns out the handler is invoked with an unexpected stack frame in the 
>> "trampoline stack" (the top of the "trampoline stack", as stored in 
>> TSS.rsp0, is 0xffff810001a36278 in this example):
>>
>> (gdb) x/6g $rsp
>> 0xffff810001a36248:     0x00000000b7f81ea1      0x0000000000000073
>> 0xffff810001a36258:     0x0000000000000246      0x00000000bfa78274
>> 0xffff810001a36268:     0x000000000000007b      0x000000000000007b
>>
>> There should be five entries (rip at the bottom, cs, eflags, rsp, and ss 
>> at the top) only but the actual stack contains one mysterious and bogus 
>> extra entry (the other 0x000000000000007b) at the top, shifting everything 
>> else down by 8 bytes.
>>
>> This stack frame (including the bogus extra entry) is copied to the 
>> regular kernel stack. As far as I can tell, the same extra entry appears 
>> in other interrupt handlers, e.g. common_interrupt.
>>
>> Functions that get an explicit pointer to struct pt_regs are not affected 
>> but anything that expects to find pt_regs at the top of the stack (e.g. 
>> compat_alloc_user_space via task_pt_regs(current)) breaks.
>>
>> It seems the extra entry is added because the top of the trampoline stack 
>> is not aligned to 16 bytes and the CPU does not like it and enforces the 
>> alignment, shifting the whole stack down wrt. the expected layout as a 
>> result.
> 
> AMD64 Architecture Programmer's Manual, Volume 2, Section 8.9 mentions
> 16-byte alignment:
> 
> "Interrupt-Stack Alignment.  In legacy mode, the interrupt-stack pointer
> can be aligned at any address boundary.  Long mode, however, aligns the
> stack on a 16-byte boundary.  This alignment is performed by the
> processor in hardware before pushing items onto the stack frame.  The
> previous RSP is saved unconditionally on the new stack by the interrupt
> mechanism.  A subsequent IRET instruction automatically restores the
> previous RSP."
> 
> "Although the RSP alignment is always performed in long mode, it is only
> of consequence when the interrupted program is already running at CPL=0,
> and it is generally used only within the operating-system kernel.  The
> operating system should put 16-byte aligned RSP values in the TSS for
> interrupts that change privilege levels."
> 
>> I have modified struct tss_struct in include/asm-x86_64/processor.h to
>> include an extra "unsigned long stack_padding" between stack_canary and 
>> stack to make stack 16-byte aligned...
>>
>>
>> ---snip---
>> --- include/asm-x86_64/processor.h.orig	2018-06-26 12:11:17 +0000
>> +++ include/asm-x86_64/processor.h	2018-06-26 16:50:55 +0000
>> @@ -269,6 +269,7 @@
>>  	 */
>>  #ifndef __GENKSYMS__
>>  	unsigned long		stack_canary;
>> +	unsigned long		stack_padding;
>>  	unsigned long           stack[64];
>>  #endif
>>  } __attribute__((packed, __aligned__(PAGE_SIZE)));
>> ---snip---
>>
>>
>> ...and lo! 32-bit ifconfig works as expected.
> 
> Wow.  But per my review of the full struct tss_struct, the stack[] field
> offset is:
> 
> 4+8*5+4*2+2*2+1025*8+8 = 8264

Alexander,
seems you're wrong

in my version of rhel5-based -123.1 kernel

crash> tss_struct -o
struct tss_struct {
     [0x0] u32 reserved1;
     [0x4] u64 rsp0;
     [0xc] u64 rsp1;
    [0x14] u64 rsp2;
    [0x1c] u64 reserved2;
    [0x24] u64 ist[7];
    [0x5c] u32 reserved3;
    [0x60] u32 reserved4;
    [0x64] u16 reserved5;
    [0x66] u16 io_bitmap_base;
    [0x68] unsigned long io_bitmap[1025];
  [0x2070] unsigned long stack_canary;
  [0x2078] unsigned long stack[64];
}
SIZE: 0x3000
crash> tss_struct -od
struct tss_struct {
      [0] u32 reserved1;
      [4] u64 rsp0;
     [12] u64 rsp1;
     [20] u64 rsp2;
     [28] u64 reserved2;
     [36] u64 ist[7];
     [92] u32 reserved3;
     [96] u32 reserved4;
    [100] u16 reserved5;
    [102] u16 io_bitmap_base;
    [104] unsigned long io_bitmap[1025];
   [8304] unsigned long stack_canary;
   [8312] unsigned long stack[64];
}
SIZE: 12288

Seems you missed that 'ist' filed is an array

> 
> So it's not 16-byte aligned.  But that's the start of stack[], and we
> need the stack pointer to be aligned.  Yet I suppose aligning stack[]
> itself is in fact the most appropriate fix.
> 
> Reviewing the diffs between e.g. -423 and -431, I see this stack[] in
> struct tss is in fact newly added:
> 
> @@ -245,7 +248,29 @@
>          * 8 bytes, for an extra "long" of ~0UL
>          */
>         unsigned long io_bitmap[IO_BITMAP_LONGS + 1];
> -} __attribute__((packed)) ____cacheline_aligned;
> +       /*
> +        *
> +        * The Intel SDM says (Volume 3, 7.2.1):
> +        *
> +        *  Avoid placing a page boundary in the part of the TSS that the
> +        *  processor reads during a task switch (the first 104 bytes). The
> +        *  processor may not correctly perform address translations if a
> +        *  boundary occurs in this area. During a task switch, the processor
> +        *  reads and writes into the first 104 bytes of each TSS (using
> +        *  contiguous physical addresses beginning with the physical address
> +        *  of the first byte of the TSS). So, after TSS access begins, if
> +        *  part of the 104 bytes is not physically contiguous, the processor
> +        *  will access incorrect information without generating a page-fault
> +        *  exception.
> +        *
> +        * There are also a lot of errata involving the TSS spanning a page
> +        * boundary.  Assert that we're not doing that.
> +        */
> +#ifndef __GENKSYMS__
> +       unsigned long           stack_canary;
> +       unsigned long           stack[64];
> +#endif
> +} __attribute__((packed, __aligned__(PAGE_SIZE)));
> 
> Do you think this is a RHEL5 bug?
> 
> Thanks,
> 
> Alexander
> 

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.