|
Message-ID: <CAG48ez1T5v=iryQk0fPkUr2umRpfMrSPJ2pYcB5HDbc3-kYBUw@mail.gmail.com> Date: Tue, 15 May 2018 18:19:18 +0200 From: Jann Horn <jannh@...gle.com> To: Alexey Gladkov <gladkov.alexey@...il.com> Cc: Kees Cook <keescook@...omium.org>, Andy Lutomirski <luto@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, linux-fsdevel@...r.kernel.org, kernel list <linux-kernel@...r.kernel.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, linux-security-module <linux-security-module@...r.kernel.org>, Linux API <linux-api@...r.kernel.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Alexander Viro <viro@...iv.linux.org.uk>, Akinobu Mita <akinobu.mita@...il.com>, Oleg Nesterov <oleg@...hat.com>, Jeff Layton <jlayton@...chiereds.net>, Ingo Molnar <mingo@...nel.org>, Alexey Dobriyan <adobriyan@...il.com>, "Eric W. Biederman" <ebiederm@...ssion.com>, Linus Torvalds <torvalds@...ux-foundation.org>, aniel Micay <danielmicay@...il.com>, Jonathan Corbet <corbet@....net>, Bruce Fields <bfields@...ldses.org>, Stephen Rothwell <sfr@...b.auug.org.au>, Solar Designer <solar@...nwall.com>, "Dmitry V. Levin" <ldv@...linux.org>, Djalal Harouni <tixxdz@...il.com> Subject: Re: [PATCH v5 1/7] proc: add proc_fs_info struct to store proc information On Tue, May 15, 2018 at 9:21 AM, Alexey Gladkov <gladkov.alexey@...il.com> wrote: > On Fri, May 11, 2018 at 03:49:13PM +0200, Jann Horn wrote: >> On Fri, May 11, 2018 at 11:34 AM, Alexey Gladkov >> <gladkov.alexey@...il.com> wrote: >> > From: Djalal Harouni <tixxdz@...il.com> >> > >> > This is a preparation patch that adds proc_fs_info to be able to store >> > different procfs options and informations. Right now some mount options >> > are stored inside the pid namespace which makes it hard to change or >> > modernize procfs without affecting pid namespaces. Plus we do want to >> > treat proc as more of a real mount point and filesystem. procfs is part >> > of Linux API where it offers some features using filesystem syscalls and >> > in order to support some features where we are able to have multiple >> > instances of procfs, each one with its mount options inside the same pid >> > namespace, we have to separate these procfs instances. >> > >> > This is the same feature that was also added to other Linux interfaces >> > like devpts in order to support containers, sandboxes, and to have >> > multiple instances of devpts filesystem [1]. >> > >> > [1] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14 >> > >> > Cc: Kees Cook <keescook@...omium.org> >> > Suggested-by: Andy Lutomirski <luto@...nel.org> >> > Signed-off-by: Djalal Harouni <tixxdz@...il.com> >> > Signed-off-by: Alexey Gladkov <gladkov.alexey@...il.com> >> > --- >> [...] >> > static struct dentry *proc_mount(struct file_system_type *fs_type, >> > int flags, const char *dev_name, void *data) >> > { >> > + int error; >> > + struct super_block *sb; >> > struct pid_namespace *ns; >> > + struct proc_fs_info *fs_info; >> > + >> > + /* >> > + * Don't allow mounting unless the caller has CAP_SYS_ADMIN over >> > + * the namespace. >> > + */ >> > + if (!(flags & MS_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) >> > + return ERR_PTR(-EPERM); >> >> Is this correct? >> >> The old code invoked a check with the same comment through mount_ns(); >> however, this patch changes the semantics of the check. >> The old code checked that the caller has privileges over the user >> namespace that contains the PID namespace; in other words, it checked >> that the caller has privileges over the PID namespace. The current >> code just checks that the caller is privileged over its own user >> namespace. >> >> As far as I can tell, this means that by doing something like this: >> >> unshare(CLONE_NEWNS|CLONE_NEWUSER); >> mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL); >> mount("proc", "/proc", "proc", 0, "newinstance,pids=all"); >> >> any process could create a new unrestricted procfs mount for its PID >> namespace, even if it is only supposed to have access to a more >> restricted procfs mount. > > Hm... let me investigate this. It looks like mount with "newinstance" > option should fail if pid namespace is the same and the current and parent > user namespace do not match. I don't understand that last sentence. What does "if pid namespace is the same" mean, and what does "current and parent user namespace do not match" mean? Just changing "ns_capable(current_user_ns(), CAP_SYS_ADMIN)" to "ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)" should be enough to get the old semantics again: It checks whether the current task is capable over its PID namespace.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.