|
Message-ID: <20160919213741.GN18931@jhogan-linux.le.imgtec.org>
Date: Mon, 19 Sep 2016 22:37:41 +0100
From: James Hogan <james.hogan@...tec.com>
To: Kees Cook <keescook@...omium.org>
CC: Guenter Roeck <linux@...ck-us.net>, Petr Mladek <pmladek@...e.com>, LKML
<linux-kernel@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>, <linux-metag@...r.kernel.org>, Ingo Molnar
<mingo@...nel.org>, "kernel-hardening@...ts.openwall.com"
<kernel-hardening@...ts.openwall.com>
Subject: Re: qemu:metag image runtime failure in -next due to 'kthread: allow
to cancel kthread work'
On Mon, Sep 19, 2016 at 12:53:49PM -0700, Kees Cook wrote:
> On Mon, Sep 19, 2016 at 6:59 AM, James Hogan <james.hogan@...tec.com> wrote:
> > On Fri, Sep 16, 2016 at 07:58:12PM -0700, Kees Cook wrote:
> >> On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@...tec.com> wrote:
> >> > On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
> >> >> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
> >> >> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
> >> >> > > Hi,
> >> >> > >
> >> >> > > I see the following runtime error in -next when running a metag qemu emulation.
> >> >> > >
> >> >> > > [ ... ]
> >> >> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
> >> >> > > io scheduler noop registered (default)
> >> >> > > brd: module loaded
> >> >> > > Warning: unable to open an initial console.
> >> >> > > List of all partitions:
> >> >> > > 0100 16384 ram0 (driver?)
> >> >> > > No filesystem could mount root, tried:
> >> >> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
> >> >> > >
> >> >> > > An example for a complete log is at:
> >> >> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
> >> >> > >
> >> >> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
> >> >> > > I don't know (yet) if other architectures are affected. bisect log is attached.
> >> >> > >
> >> >> > > The scripts to run this test are available at
> >> >> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
> >> >> > >
> >> >> > > Guenter
> >> >> >
> >> >> > Thanks Guenter,
> >> >> >
> >> >> > It appears to be related to the command line. After that commit the
> >> >> > command line is shown as empty (rather than your "rdinit=/sbin/init
> >> >> > doreboot"), but it can still be overridden in the config and then it
> >> >> > continues to work.
> >> >> >
> >> >> Weird.
> >> >
> >> > Previously the Elf had a single load program header:
> >> >
> >> > Program Headers:
> >> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> >> > LOAD 0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
> >> > NOTE 0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R 0x4
> >> >
> >> > QEMU puts the args at 40376230, straight after the load region (unlike a
> >> > real Meta Linux bootloader).
> >> >
> >> > After the above commit the ELF gets two load program headers, with a
> >> > small alignment gap in the middle:
> >> >
> >> > Program Headers:
> >> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> >> > LOAD 0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
> >> > LOAD 0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
> >> > NOTE 0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R 0x4
> >> >
> >> > Here this version of QEMU puts the args at where it thinks the end of
> >> > the loaded image is, which is based on the number of bytes copied from
> >> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
> >> > gap in between, so it puts them at 0x40377348.
> >> >
> >> > But of course:
> >> > 40378230 B ___bss_stop
> >> >
> >> > so it wipes them out while clearing bss during early init.
> >> >
> >> > Previously:
> >> > 4018ebd0 T __sdata
> >> > 4018f000 R ___start_rodata
> >> >
> >> > now:
> >> > 4018ed98 T __sdata
> >> > 40190000 R ___start_rodata
> >> >
> >> > So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
> >> > Introduce post-init read-only memory").
> >> >
> >> > The hack below does indeed reduce it to a single load section and this
> >> > version of QEMU then succeeds:
> >> >
> >> > Program Headers:
> >> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> >> > LOAD 0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
> >> > NOTE 0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R 0x4
> >> >
> >> > diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
> >> > index a43b650cfdc0..b5c7364a94da 100644
> >> > --- a/arch/metag/include/asm/cache.h
> >> > +++ b/arch/metag/include/asm/cache.h
> >> > @@ -20,4 +20,6 @@
> >> >
> >> > #define __read_mostly __attribute__((__section__(".data..read_mostly")))
> >> >
> >> > +#define __ro_after_init __read_mostly
> >> > +
> >> > #endif
> >> >
> >> > Kees: Is it expected to get multiple load headers like this since your
> >> > patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
> >> > depending on alignment of the read only section?
> >>
> >> I'm not expecting that, and I'm especially not expecting any LOAD to
> >> have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
> >> common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
> >> RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
> >> in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
> >> section (which is currently very small).
> >>
> >> Could the metag linker be failing to override the section flags when
> >> putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
> >> in half? This smells like a linker glitch.
> >
> > Where is that section flags override supposed to be happening?
> >
> > The .rodata section is indeed becoming slightly larger to fit the
> > ptmx_fops __ro_after_init, and in the process changing from flag A to
> > flag WA (presumably means becoming writable).
> >
> > This then appears to be hitting the condition in the linker whereby a
> > new segment is started if a writable section is found after a readonly
> > section, and the sections are on separate pages:
> >
> > https://sourceware.org/git/?p=binutils.git;a=blob;f=bfd/elf.c;h=8df38ee37923c020994715f111a0ffc6bae83c8c;hb=HEAD#l3952
>
> Hm, well, I think x86, arm, and arm64 don't do this. They just clobber
> the flags from the section and force it to match the master section
> (i.e. ro_after_init gets rodata's flags).
Okay. Please forgive my ignorance, I'm just trying to understand the
mechanism, but is that thought to happen automatically due to
*(.data..ro_after_init) being in a section after *(.rodata) in the
linker script?
The definition in question is:
static struct file_operations ptmx_fops __ro_after_init;
Which isn't const or anything, so I'm not sure how else the linker is
supposed to know to make a section read-only that contains non-read-only
data.
Okay, I just built x86_64 default defconfig (on ef98de028afd, half way
through the mm patches on linux-next from the other day where metag
stopped booting). Perhaps I'm missing some important config option to
enable the memory protection (if so I appologise).
For metag:
$ readelf -S drivers/tty/pty.o
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[51] .data..ro_after_i PROGBITS 00000000 00f0c0 00007c 00 WA 0 0 4
$ readelf -S vmlinux.bust:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 4] .rodata PROGBITS 40190000 194000 04c9c8 00 WA 0 0 64
And x86_64:
$ readelf -S drivers/tty/pty.o
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[18] .data..ro_after_i PROGBITS 0000000000000000 00001140
00000000000000f8 0000000000000000 WA 0 0 64
$ readelf -S vmlinux
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 4] .rodata PROGBITS ffffffff81a00000 00c00000
00000000002663d0 0000000000000000 WA 0 0 4096
Both have WA on that section in the object file and the final vmlinux
ELF too.
Thanks
James
>
> >> However, since metag doesn't support CONFIG_DEBUG_RODATA, your above
> >> patch is likely correct. If metag marks memory read-only at kernel
> >> load time, it's doing it early enough that __ro_after_init will fail
> >> to work. If it never marks memory read-only, then __ro_after_init will
> >> have no effect.
> >
> > Yes, metag doesn't make that memory read only yet, although it would be
> > possible to do so (maybe delaying the switch to 4MB pages until
> > afterwards).
> >
> > Either way it doesn't sound that unreasonable to have multiple load
> > segments generated.
>
> The bug here is that the ro_after_init portion of .rodata shouldn't be
> writable according to the ELF headers. It should be writable only
> because memory protection of .rodata hasn't happened yet. :)
>
> >> (i.e. Both situations need the proposed patch).
> >
> > purely to avoid the curious linker segment behaviour (which only trips
> > up that QEMU because its broken), or for other reasons too?
>
> Well, it means the linker behavior will go away since there will be no
> mixing of sections with different flags, and it'll match current
> expectations: __ro_after_init won't be read-only ever (same as
> .rodata).
>
> Fixing memory permissions for metag would be nice, though! :)
>
> -Kees
>
> >
> > Thanks
> > James
> >
> >>
> >> Probably the best solution for all architectures is to invent a new
> >> CONFIG_ARCH_HAS... to identify the style of kernel memory protection a
> >> given architecture has so that the default for __ro_after_init can be
> >> more sensible (and the warnings about mark_rodata_ro() can better
> >> reflect reality).
> >>
> >> -Kees
> >>
> >> --
> >> Kees Cook
> >> Nexus Security
>
>
>
> --
> Kees Cook
> Nexus Security
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.