kernel-hardening - Re: [PATCH] make TIOCSTI ioctl require CAP_SYS

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170422195034.GA17556@mail.hallyn.com>
Date: Sat, 22 Apr 2017 14:50:34 -0500
From: "Serge E. Hallyn" <serge@...lyn.com>
To: Matt Brown <matt@...tt.com>
Cc: "Serge E. Hallyn" <serge@...lyn.com>, jmorris@...ei.org,
	gregkh@...uxfoundation.org, jslaby@...e.com,
	akpm@...ux-foundation.org, jannh@...gle.com, keescook@...omium.org,
	kernel-hardening@...ts.openwall.com,
	linux-security-module@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] make TIOCSTI ioctl require CAP_SYS_ADMIN

Quoting Matt Brown (matt@...tt.com):
> On 04/21/2017 01:24 AM, Serge E. Hallyn wrote:
> >On Fri, Apr 21, 2017 at 01:09:59AM -0400, Matt Brown wrote:
> >>On 04/20/2017 01:41 PM, Serge E. Hallyn wrote:
> >>>Quoting matt@...tt.com (matt@...tt.com):
> >>>>On 2017-04-20 11:19, Serge E. Hallyn wrote:
> >>>>>Quoting Matt Brown (matt@...tt.com):
> >>>>>>On 04/19/2017 07:53 PM, Serge E. Hallyn wrote:
> >>>>>>>Quoting Matt Brown (matt@...tt.com):
> >>>>>>>>On 04/19/2017 12:58 AM, Serge E. Hallyn wrote:
> >>>>>>>>>On Tue, Apr 18, 2017 at 11:45:26PM -0400, Matt Brown wrote:
> >>>>>>>>>>This patch reproduces GRKERNSEC_HARDEN_TTY functionality from the grsecurity
> >>>>>>>>>>project in-kernel.
> >>>>>>>>>>
> >>>>>>>>>>This will create the Kconfig SECURITY_TIOCSTI_RESTRICT and the corresponding
> >>>>>>>>>>sysctl kernel.tiocsti_restrict that, when activated, restrict all TIOCSTI
> >>>>>>>>>>ioctl calls from non CAP_SYS_ADMIN users.
> >>>>>>>>>>
> >>>>>>>>>>Possible effects on userland:
> >>>>>>>>>>
> >>>>>>>>>>There could be a few user programs that would be effected by this
> >>>>>>>>>>change.
> >>>>>>>>>>See: <https://codesearch.debian.net/search?q=ioctl%5C%28.*TIOCSTI>
> >>>>>>>>>>notable programs are: agetty, csh, xemacs and tcsh
> >>>>>>>>>>
> >>>>>>>>>>However, I still believe that this change is worth it given that the
> >>>>>>>>>>Kconfig defaults to n. This will be a feature that is turned on for the
> >>>>>>>>>
> >>>>>>>>>It's not worthless, but note that for instance before this was fixed
> >>>>>>>>>in lxc, this patch would not have helped with escapes from privileged
> >>>>>>>>>containers.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>I assume you are talking about this CVE:
> >>>>>>>>https://bugzilla.redhat.com/show_bug.cgi?id=1411256
> >>>>>>>>
> >>>>>>>>In retrospect, is there any way that an escape from a privileged
> >>>>>>>>container with the this bug could have been prevented?
> >>>>>>>
> >>>>>>>I don't know, that's what I was probing for.  Detecting that the pgrp
> >>>>>>>or session - heck, the pid namespace - has changed would seem like a
> >>>>>>>good indicator that it shouldn't be able to push.
> >>>>>>>
> >>>>>>
> >>>>>>pgrp and session won't do because in the case we are discussing
> >>>>>>current->signal->tty is the same as tty.
> >>>>>>
> >>>>>>This is the current check that is already in place:
> >>>>>>| if ((current->signal->tty != tty) && !capable(CAP_SYS_ADMIN))
> >>>>>>| 	return -EPERM;
> >>>>>
> >>>>>Yeah...
> >>>>>
> >>>>>>The only thing I could find to detect the tty message coming from a
> >>>>>>container is as follows:
> >>>>>>| task_active_pid_ns(current)->level
> >>>>>>
> >>>>>>This will be zero when run on the host, but 1 when run inside a
> >>>>>>container. However this is very much a hack and could probably break
> >>>>>>some userland stuff where there are multiple levels of namespaces.
> >>>>>
> >>>>>Yes.  This is also however why I don't like the current patch, because
> >>>>>capable() will never be true in a container, so nested containers
> >>>>>break.
> >>>>>
> >>>>
> >>>>What do you mean by "capable() will never be true in a container"?
> >>>>My understanding
> >>>>is that if a container is given CAP_SYS_ADMIN then
> >>>>capable(CAP_SYS_ADMIN) will return
> >>>>true?
> >>>
> >>>No, capable(X) checks for X with respect to the initial user namespace.
> >>>So for root-owned containers it will be true, but containers running in
> >>>non-initial user namespaces cannot pass that check.
> >>>
> >>>To check for privilege with respect to another user namespace, you need
> >>>to use ns_capable.  But for that you need a user_ns to target.
> >>>
> >>
> >>How about: ns_capable(current_user_ns(),CAP_SYS_ADMIN) ?
> >>
> >>current_user_ns() was found in include/linux/cred.h
> >
> >Any user can create a new user namespace and pass the above check.  What we
> >want is to find the user namespace which opened the tty.
> >
> 
> I believe I have a working solution that I can show in the next version
> of the patch later today, but I just want to run the logic by you first.
> 
> I added: "struct user_namespace *owner_user_ns;" as a field in
> tty_struct (include/linux/tty.h) Note: I am totally open to suggestions
> for a better name.
> 
> Then I added "tty->owner_user_ns = current_user_ns();" to the
> alloc_tty_struct function. (drivers/tty/tty_io.c)

That's what I was hoping could work.  Then you can check ns_capable
with respect to that.

You'll want to grab a reference to the user_ns, and drop it on
final close, but otherwise this sounds good to me.  I don't really
know the tty layer well though so we'll need some sanity checking
from someone who does.

> When testing with a docker container, running in a different user
> namespace, I printed out current_user_ns()->level, which returned 1,
> and tty->owner_user_ns->level, which returned 0. This seems to prove
> that I am correctly storing the user namespace which opened the tty.
> 
> Please let me know if there are any edge cases that I am missing with
> this approach.

Thanks for posting this!  This seems like the best solution to me.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.