kernel-hardening - Re: [PATCH v11 06/12] seccomp: add system call filtering using BPF

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABqD9haZpe-i8kyHJDj4umCLyZ3E877aobabvW8_+aXeC+kTbg@mail.gmail.com>
Date: Tue, 28 Feb 2012 11:17:59 -0600
From: Will Drewry <wad@...omium.org>
To: Indan Zupancic <indan@....nu>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org, 
	linux-doc@...r.kernel.org, kernel-hardening@...ts.openwall.com, 
	netdev@...r.kernel.org, x86@...nel.org, arnd@...db.de, davem@...emloft.net, 
	hpa@...or.com, mingo@...hat.com, oleg@...hat.com, peterz@...radead.org, 
	rdunlap@...otime.net, mcgrathr@...omium.org, tglx@...utronix.de, luto@....edu, 
	eparis@...hat.com, serge.hallyn@...onical.com, djm@...drot.org, 
	scarybeasts@...il.com, pmoore@...hat.com, akpm@...ux-foundation.org, 
	corbet@....net, eric.dumazet@...il.com, markus@...omium.org, 
	coreyb@...ux.vnet.ibm.com, keescook@...omium.org
Subject: Re: [PATCH v11 06/12] seccomp: add system call filtering using BPF

On Tue, Feb 28, 2012 at 12:51 AM, Indan Zupancic <indan@....nu> wrote:
> Hello,
>
> On Sat, February 25, 2012 04:21, Will Drewry wrote:
>> @@ -169,6 +170,7 @@ void free_task(struct task_struct *tsk)
>>       free_thread_info(tsk->stack);
>>       rt_mutex_debug_task_free(tsk);
>>       ftrace_graph_exit_task(tsk);
>> +     put_seccomp_filter(tsk->seccomp.filter);
>>       free_task_struct(tsk);
>> }
>> EXPORT_SYMBOL(free_task);
>> @@ -1113,6 +1115,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>>               goto fork_out;
>>
>>       ftrace_graph_init_task(p);
>> +     copy_seccomp(&p->seccomp, &current->seccomp);
>
> I agree it's more symmetrical when get_seccomp_filter() is used here
> directly instead of copy_seccomp(). That should put Kees at ease.

Makes sense to me too. Once I'd dropped the other bits, it seemed
silly to keep copy_seccomp.

>> +static void seccomp_filter_log_failure(int syscall)
>> +{
>> +     int compat = 0;
>> +#ifdef CONFIG_COMPAT
>> +     compat = is_compat_task();
>> +#endif
>> +     pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n",
>> +             current->comm, task_pid_nr(current),
>> +             (compat ? "compat " : ""),
>> +             syscall, KSTK_EIP(current));
>> +}
>
> This should be at least rate limited, but could be dropped altogether,
> as it's mostly useful for debugging filters. There is no kernel message
> when a process is killed because it exceeds a ulimit either. The death
> by SIGSYS is hopefully clear enough for users, and filter writers can
> return different errno values when debugging where it goes wrong.

I'll pull Kees's patch into the series this next go-round.

>> +/**
>> + * bpf_load: checks and returns a pointer to the requested offset
>> + * @nr: int syscall passed as a void * to bpf_run_filter
>> + * @off: offset into struct seccomp_data to load from
>
> Must be aligned, that's worth mentioning.

True - thanks!

>> + * @size: number of bytes to load (must be 4)
>> + * @buffer: temporary storage supplied by bpf_run_filter (4 bytes)
>
> '@...'.

Oops - fixed.

>> +/**
>> + * copy_seccomp: manages inheritance on fork
>> + * @child: forkee's seccomp
>> + * @parent: forker's seccomp
>> + *
>> + * Ensures that @child inherits filters if in use.
>> + */
>> +void copy_seccomp(struct seccomp *child, const struct seccomp *parent)
>> +{
>> +     /* Other fields are handled by dup_task_struct. */
>> +     child->filter = get_seccomp_filter(parent->filter);
>> +}
>> +#endif       /* CONFIG_SECCOMP_FILTER */
>
> Yeah, just get rid of this and use get_seccomp_filter directly, and make
> it return void instead of a pointer.

It'll be updated.

>>
>> /*
>>  * Secure computing mode 1 allows only read/write/exit/sigreturn.
>> @@ -34,10 +293,11 @@ static int mode1_syscalls_32[] = {
>> void __secure_computing(int this_syscall)
>> {
>>       int mode = current->seccomp.mode;
>> -     int * syscall;
>> +     int exit_code = SIGKILL;
>> +     int *syscall;
>>
>>       switch (mode) {
>> -     case 1:
>> +     case SECCOMP_MODE_STRICT:
>>               syscall = mode1_syscalls;
>> #ifdef CONFIG_COMPAT
>>               if (is_compat_task())
>> @@ -48,6 +308,14 @@ void __secure_computing(int this_syscall)
>>                               return;
>>               } while (*++syscall);
>>               break;
>> +#ifdef CONFIG_SECCOMP_FILTER
>> +     case SECCOMP_MODE_FILTER:
>> +             if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW)
>> +                     return;
>> +             seccomp_filter_log_failure(this_syscall);
>> +             exit_code = SIGSYS;
>
> Wouldn't it make more sense to always kill with SIGSYS, also for mode 1?
> I suppose it's too late for that now.

It would but I don't want to go and change existing ABI.

>> +/**
>> + * prctl_set_seccomp: configures current->seccomp.mode
>> + * @seccomp_mode: requested mode to use
>> + * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
>> + *
>> + * This function may be called repeatedly with a @seccomp_mode of
>> + * SECCOMP_MODE_FILTER to install additional filters.  Every filter
>> + * successfully installed will be evaluated (in reverse order) for each system
>> + * call the task makes.
>> + *
>> + * It is not possible to transition change the current->seccomp.mode after
>> + * one has been set on a task.
>
> That reads awkwardly, do you mean the mode can't be changed once it's set?

Yup - I will fix that.  It doesn't even make sense :)

> ---
>
> Reviewed-by: Indan Zupancic <indan@....nu>

Thanks!

> All in all I'm quite happy with how the code is now, at least this bit.
> I'm still unsure about the networking patch and the 32-bit BPF on 64-bit
> archs mismatch.

Yeah - that is really the most challenging set of compromises, but I
think they are the right ones.

> As for the unlimited filter count, I'm not sure how to fix that.
> The problem is that filters are inherited and shared. Limiting the
> list length (tree depth) helps a bit, then the total memory usage
> is limited by max number of processes. It may make sense to limit
> the total instruction count instead of the number of filters.

I'll take a stab at tree path size for starters and hopefully we can
encourage consumers of the API to check for errors on return.

Thanks for the continued review and feedback!
will
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.