kernel-hardening - Re: [PATCH v5 03/32] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mvl8tn93.fsf@gmail.com>
Date: Sat, 23 Jul 2016 16:58:16 +0200
From: Nicolai Stange <nicstange@...il.com>
To: Valdis.Kletnieks@...edu
Cc: Andy Lutomirski <luto@...nel.org>,  kernel-hardening@...ts.openwall.com,  x86@...nel.org,  linux-kernel@...r.kernel.org,  linux-arch@...r.kernel.org,  Borislav Petkov <bp@...en8.de>,  Nadav Amit <nadav.amit@...il.com>,  Kees Cook <keescook@...omium.org>,  Brian Gerst <brgerst@...il.com>,  Linus Torvalds <torvalds@...ux-foundation.org>,  Josh Poimboeuf <jpoimboe@...hat.com>,  Jann Horn <jann@...jh.net>,  Heiko Carstens <heiko.carstens@...ibm.com>, Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH v5 03/32] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

Valdis.Kletnieks@...edu writes:

> On Thu, 21 Jul 2016 22:34:33 -0700, Andy Lutomirski said:
>
>> How much memory do you have and what's your config?  My code is
>> obviously buggy, but I'm wondering why neither I nor the 0day bot caught
>> this.
>
> Probably because your devel box and the 0day bot both have 4-level page
> tables and the dual-core i5 in my laptop has (presumably) 3?
>
> In any case, your patch didn't fix things, nor did (as you noted in a mail
> to Ingo) does reverting the problem commit (and then the following one that
> deletes now-dead code so it will compile cleanly).


Applying the patch directly on top of 360cb4d15567 ("x86/mm/cpa: In
populate_pgd(), don't set the PGD entry until it's populated") *does*
fix things for me.

Hardware: i7-4800MQ, 8GiB RAM, Dell Latitude E6540

FYI, the kernel panic grabbed via console=uart,io,0x3f8,... is

BUG: unable to handle kernel paging request at ffffb92ac0000fc0
IP: [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
PGD 0 
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6+ #190
Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
task: ffffffff81e0d580 ti: ffffffff81e00000 task.ti: ffffffff81e00000
RIP: 0010:[<ffffffff8106b8d1>]  [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
RSP: 0000:ffffffff81e03c38  EFLAGS: 00010206
RAX: 00000000ff0000f3 RBX: 00000000ff000000 RCX: ffff880000000000
RDX: ffffb92ac0000fc0 RSI: 00000000ff0000f3 RDI: ffffb92ac0000fc0
RBP: ffffffff81e03c90 R08: ffff880000000fc0 R09: 0000000000000073
R10: ffff88022ede5000 R11: 0000000000000001 R12: ffffffff81e03e48
R13: 0000000001000000 R14: 0000000000000073 R15: ffff880000000018
FS:  0000000000000000(0000) GS:ffff88022ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffb92ac0000fc0 CR3: 0000000001e06000 CR4: 00000000000406b0
Stack:
 ffffffff81e03c90 ffffffff8107217f 0000000000000073 0000000100000000
 0000000000000001 0000000000001000 ffff880000000018 0000000000001000
 ffffffff81e03e48 0000000100000000 ffffffffff2018a8 ffffffff81e03d08
Call Trace:
 [<ffffffff8107217f>] ? populate_pmd+0x11f/0x2c0
 [<ffffffff81072823>] __cpa_process_fault+0x503/0x5d0
 [<ffffffff81073223>] __change_page_attr_set_clr+0x563/0xe00
 [<ffffffff81074e6f>] kernel_map_pages_in_pgd+0x8f/0xd0
 [<ffffffff81fa5e2e>] __map_region+0x3c/0x58
 [<ffffffff81fa6064>] efi_map_region+0x31/0xca
 [<ffffffff81fa5af3>] efi_enter_virtual_mode+0x215/0x4bd
 [<ffffffff814c6289>] ? acpi_os_signal_semaphore+0x2c/0x38
 [<ffffffff814f5c4a>] ? acpi_ut_initialize_interfaces+0x62/0x67
 [<ffffffff81f84f78>] start_kernel+0x3cf/0x478
 [<ffffffff81f84120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff81f842db>] x86_64_start_reservations+0x2f/0x31
 [<ffffffff81f84429>] x86_64_start_kernel+0x14c/0x16f
Code: 89 e5 48 89 47 04 5d c3 66 90 55 48 89 e5 0f 01 f8 5d c3 0f 1f 80 00 00 00 00 55 48 89 37 48 89 e5 5d c3 0f 1f 80 00 00 00 00 55 <48> 89 37 48 89 e5 5d c3 0f 1f 80 00 00 00 00 55 48 89 37 48 89 
RIP  [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
 RSP <ffffffff81e03c38>
CR2: ffffb92ac0000fc0
---[ end trace 2f8154f277751049 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!


The reason the patch didn't work for Valdis might be that there is
another issue in next-20150722 with the same symptoms (provided you
don't watch the serial console). Valdis, did you apply the provided
patch on top of next?

The "other issue" is:

RDX: 0000000000000010 RSI: 00000000000306c3 RDI: ffff88003bdea2fc
RBP: ffffffffb6e03a70 R08: ffff88003bdea000 R09: 0000000000000000
R10: ffffffffb713d3a0 R11: 0000000000000008 R12: 0000000000000020
R13: ffff88003bdea2fc R14: ffffffffb6e03a80 R15: ffffffffb6e03ea0
FS:  0000000000000000(0000) GS:ffff9208aea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88003bdea300 CR3: 00000001dce06000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffffffb6054cea 0000000000000000 0000000100000000 0000000000000001
 0000000000000000 0000000000000000 ffffffffb705c2e0 000000003fffc000
 ffffffffb6e03e90 ffffffffb6055487 ffff88003bdea2fc ffffffffb6e0d580
Call Trace:
 [<ffffffffb6054cea>] ? find_microcode_patch+0x4a/0xa0
 [<ffffffffb6055487>] load_microcode.isra.1.constprop.12+0x37/0xa0
 [<ffffffffb6036700>] ? dump_trace+0x120/0x320
 [<ffffffffb644fee8>] ? put_dec+0x18/0xa0
 [<ffffffffb645025d>] ? number+0x2ed/0x300
 [<ffffffffb6ff3ba1>] ? serial_putc+0x1e/0x2d
 [<ffffffffb6ff3b83>] ? serial8250_early_out+0x62/0x62
 [<ffffffffb654f127>] ? uart_console_write+0x57/0x70
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb6152775>] ? __module_address+0x5/0xf0
 [<ffffffffb6152872>] ? __module_text_address+0x12/0x60
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb60d68a6>] ? __kernel_text_address+0x56/0x70
 [<ffffffffb60371bb>] ? print_context_stack+0x7b/0x100
 [<ffffffffb6109695>] ? __bfs+0x25/0x280
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb6152775>] ? __module_address+0x5/0xf0
 [<ffffffffb6152872>] ? __module_text_address+0x12/0x60
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb60d68a6>] ? __kernel_text_address+0x56/0x70
 [<ffffffffb60371bb>] ? print_context_stack+0x7b/0x100
 [<ffffffffb6036700>] ? dump_trace+0x120/0x320
 [<ffffffffb644fee8>] ? put_dec+0x18/0xa0
 [<ffffffffb645025d>] ? number+0x2ed/0x300
 [<ffffffffb6ff3ba1>] ? serial_putc+0x1e/0x2d
 [<ffffffffb6ff3b83>] ? serial8250_early_out+0x62/0x62
 [<ffffffffb654f127>] ? uart_console_write+0x57/0x70
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb689de84>] ? _raw_spin_unlock_irqrestore+0x54/0x60
 [<ffffffffb611f16d>] ? console_unlock+0x33d/0x670
 [<ffffffffb611f7a1>] ? vprintk_emit+0x301/0x5e0
 [<ffffffffb605553f>] ? collect_cpu_info_early+0x4f/0x140
 [<ffffffffb61ea845>] ? __pr_info+0x5a/0x76
 [<ffffffffb60557cd>] load_ucode_intel_ap+0x5d/0x80
 [<ffffffffb6054924>] load_ucode_ap+0x94/0xa0
 [<ffffffffb60481a8>] cpu_init+0x58/0x3e0
 [<ffffffffb60709bc>] ? set_pte_vaddr+0x5c/0x90
 [<ffffffffb6fac06c>] trap_init+0x2b6/0x328
 [<ffffffffb6fa0dba>] start_kernel+0x224/0x47f
 [<ffffffffb6fa0120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffffb6fa02cf>] x86_64_start_reservations+0x29/0x2b
 [<ffffffffb6fa041e>] x86_64_start_kernel+0x14d/0x170
Code: c1 74 04 85 c2 74 e4 b8 01 00 00 00 5d c3 41 89 ca b8 01 00 00 00 41 09 d2 74 f1 85 d1 74 98 5d c3 31 c0 5d c3 90 e8 eb b1 84 00 <39> 4f 04 77 03 31 c0 c3 55 48 89 e5 e8 6a ff ff ff 5d c3 0f 1f 
RIP  [<ffffffffb6055af5>] has_newer_microcode+0x5/0x20
 RSP <ffffffffb6e03a30>
CR2: ffff88003bdea300
---[ end trace b163fd3960fd46fb ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

I bisected this one to 21ef9a5c3164 ("Merge branch 'x86/microcode'"). Both
of its parents do not exhibit that behaviour.  This merge's author is
Ingo Molnar, so I added him to the CC list.


Thanks,

Nicolai
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.