kernel-hardening - Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120410125443.3ab1a277.akpm@linux-foundation.org>
Date: Tue, 10 Apr 2012 12:54:43 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: "Indan Zupancic" <indan@....nu>
Cc: "Will Drewry" <wad@...omium.org>, linux-kernel@...r.kernel.org,
 linux-security-module@...r.kernel.org, linux-arch@...r.kernel.org,
 linux-doc@...r.kernel.org, kernel-hardening@...ts.openwall.com,
 netdev@...r.kernel.org, x86@...nel.org, arnd@...db.de, davem@...emloft.net,
 hpa@...or.com, mingo@...hat.com, oleg@...hat.com, peterz@...radead.org,
 rdunlap@...otime.net, mcgrathr@...omium.org, tglx@...utronix.de,
 luto@....edu, eparis@...hat.com, serge.hallyn@...onical.com,
 djm@...drot.org, scarybeasts@...il.com, pmoore@...hat.com, corbet@....net,
 eric.dumazet@...il.com, markus@...omium.org, coreyb@...ux.vnet.ibm.com,
 keescook@...omium.org, jmorris@...ei.org
Subject: Re: [PATCH v17 08/15] seccomp: add system call filtering using BPF

On Mon, 9 Apr 2012 04:22:40 +1000
"Indan Zupancic" <indan@....nu> wrote:

> On Sat, April 7, 2012 06:23, Andrew Morton wrote:
> > hm, I'm surprised that we don't have a zero-returning implementation of
> > is_compat_task() when CONFIG_COMPAT=n.  Seems silly.  Blames Arnd.
> 
> It's sneakily hidden at the end of compat.h.
> 
> >> +/**
> >> + * get_u32 - returns a u32 offset into data
> >> + * @data: a unsigned 64 bit value
> >> + * @index: 0 or 1 to return the first or second 32-bits
> >> + *
> >> + * This inline exists to hide the length of unsigned long.
> >> + * If a 32-bit unsigned long is passed in, it will be extended
> >> + * and the top 32-bits will be 0. If it is a 64-bit unsigned
> >> + * long, then whatever data is resident will be properly returned.
> >> + */
> >> +static inline u32 get_u32(u64 data, int index)
> >> +{
> >> +	return ((u32 *)&data)[index];
> >> +}
> >
> > This seems utterly broken on big-endian machines.  If so: fix.  If not:
> > add comment explaining why?
> 
> It's not a bug, it's intentional.

Well it looks like a bug, which is why I suggest that it be clearly
commented.

> >
> >> +	if (total_insns > MAX_INSNS_PER_PATH)
> >> +		return -ENOMEM;
> >> +
> >> +	/*
> >> +	 * Installing a seccomp filter requires that the task have
> >> +	 * CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> +	 * This avoids scenarios where unprivileged tasks can affect the
> >> +	 * behavior of privileged children.
> >> +	 */
> >> +	if (!current->no_new_privs &&
> >> +	    security_capable_noaudit(current_cred(), current_user_ns(),
> >> +				     CAP_SYS_ADMIN) != 0)
> >> +		return -EACCES;
> >> +
> >> +	/* Allocate a new seccomp_filter */
> >> +	filter = kzalloc(sizeof(struct seccomp_filter) + fp_size, GFP_KERNEL);
> >
> > I think this gives userspace an easy way of causing page allocation
> > failure warnings, by permitting large kmalloc() attempts.  Add
> > __GFP_NOWARN?
> 
> Max is 32kb. sk_attach_filter() in net/core/filter.c is worse,
> it allocates up to 512kb before even checking the length.

An order-3 allocation attempt is pretty fragile.  This will sometimes
fail.

> What about using GFP_USER (and adding __GFP_NOWARN to GFP_USER) instead?

Let's be conventional and use the open-coded __GFP_NOWARN. 
__GFP_NOWARN says "this is a big allocation which will sometimes fail
and I have carefully reviewed the failure paths and runtime tested
them".

Please carefully review the failure paths and runtime test them ;)

> >> +	/* Check and rewrite the fprog via the skb checker */
> >> +	ret = sk_chk_filter(filter->insns, filter->len);
> >> +	if (ret)
> >> +		goto fail;
> >> +
> >> +	/* Check and rewrite the fprog for seccomp use */
> >> +	ret = seccomp_chk_filter(filter->insns, filter->len);
> >
> > "check" is spelled "check"!
> 
> Yes, it is and he did spell "check" as "Check".
> 
> seccomp_chk_filter() mirrors sk_chk_filter(). So it refers to
> "chk", not "check".

bah.  Two poor identifiers isn't better than one.  Whatever.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.