kernel-hardening - Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120406140503.10b75c5b.akpm@linux-foundation.org>
Date: Fri, 6 Apr 2012 14:05:03 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Kees Cook <keescook@...omium.org>
Cc: Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
 linux-security-module@...r.kernel.org, linux-arch@...r.kernel.org,
 linux-doc@...r.kernel.org, kernel-hardening@...ts.openwall.com,
 netdev@...r.kernel.org, x86@...nel.org, arnd@...db.de, davem@...emloft.net,
 hpa@...or.com, mingo@...hat.com, oleg@...hat.com, peterz@...radead.org,
 rdunlap@...otime.net, mcgrathr@...omium.org, tglx@...utronix.de,
 luto@....edu, eparis@...hat.com, serge.hallyn@...onical.com,
 djm@...drot.org, scarybeasts@...il.com, indan@....nu, pmoore@...hat.com,
 corbet@....net, eric.dumazet@...il.com, markus@...omium.org,
 coreyb@...ux.vnet.ibm.com, jmorris@...ei.org
Subject: Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF

On Fri, 6 Apr 2012 13:44:43 -0700
Kees Cook <keescook@...omium.org> wrote:

> On Fri, Apr 6, 2012 at 1:23 PM, Andrew Morton <akpm@...ux-foundation.org> wrote:
> > On Thu, 29 Mar 2012 15:01:53 -0500
> > Will Drewry <wad@...omium.org> wrote:
> >
> >> [This patch depends on luto@....edu's no_new_privs patch:
> >>    https://lkml.org/lkml/2012/1/30/264
> >>  included in this series for ease of consumption.
> >> ]
> >>
> >> This patch adds support for seccomp mode 2.  Mode 2 introduces the
> >> ability for unprivileged processes to install system call filtering
> >> policy expressed in terms of a Berkeley Packet Filter (BPF) program.
> >> This program will be evaluated in the kernel for each system call
> >> the task makes and computes a result based on data in the format
> >> of struct seccomp_data.
> >> ...
> >> +static void seccomp_filter_log_failure(int syscall)
> >> +{
> >> +     int compat = 0;
> >> +#ifdef CONFIG_COMPAT
> >> +     compat = is_compat_task();
> >> +#endif
> >
> > hm, I'm surprised that we don't have a zero-returning implementation of
> > is_compat_task() when CONFIG_COMPAT=n.  Seems silly.  Blames Arnd.
> 
> There is

I can't find it.  The definition in include/linux/compat.h is inside
#ifdef CONFIG_COMPAT.

> >> +static long seccomp_attach_filter(struct sock_fprog *fprog)
> >> +{
> >> +     struct seccomp_filter *filter;
> >> +     unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
> >> +     unsigned long total_insns = fprog->len;
> >> +     long ret;
> >> +
> >> +     if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
> >> +             return -EINVAL;
> >> +
> >> +     for (filter = current->seccomp.filter; filter; filter = filter->prev)
> >> +             total_insns += filter->len + 4;  /* include a 4 instr penalty */
> >
> > So tasks don't share filters?  We copy them by value at fork?  Do we do
> > this at vfork() too?
> 
> The filter chain is shared (and refcounted).

So what's the locking rule for accessing and modifying that
singly-linked list?

> ...
> >> +/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
> >> +void put_seccomp_filter(struct task_struct *tsk)
> >> +{
> >> +     struct seccomp_filter *orig = tsk->seccomp.filter;
> >> +     /* Clean up single-reference branches iteratively. */
> >> +     while (orig && atomic_dec_and_test(&orig->usage)) {
> >> +             struct seccomp_filter *freeme = orig;
> >> +             orig = orig->prev;
> >> +             kfree(freeme);
> >> +     }
> >> +}
> >
> > So if one of the filters in the list has an elevated refcount, we bail
> > out on the remainder of the list.  Seems odd.
> 
> This so that every filter in the list doesn't need to have their
> refcount raised. As long as the counting up matching the counting
> down, it's fine. This allows for process trees branching the filter
> list at different times still being safe. IIUC, this code was based on
> how namespace refcounting is handled. I spent some time proving to
> myself that it was correctly refcounted a while back. More eyes is
> better, of course. :)

Please ensure that future readers of this code have a description of
how it is supposed to work.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.