kernel-hardening - Re: [RFC] syscalls: Restore address limit after a syscall

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWrbDnka=aqzQ5+7WN2N6RwFsQb7smcnjQMsjov+y=dTQ@mail.gmail.com>
Date: Thu, 9 Feb 2017 15:05:14 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Kees Cook <keescook@...omium.org>
Cc: Thomas Garnier <thgarnie@...gle.com>, Dave Hansen <dave.hansen@...el.com>, 
	Arnd Bergmann <arnd@...db.de>, René Nyffenegger <mail@...enyffenegger.ch>, 
	Stephen Bates <stephen.bates@...s.com>, Jeff Moyer <jmoyer@...hat.com>, 
	Milosz Tanski <milosz@...in.com>, Linux API <linux-api@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, 
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>
Subject: Re: [RFC] syscalls: Restore address limit after a syscall

On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@...omium.org> wrote:
> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@...gle.com> wrote:
>> This patch prevents a syscall to modify the address limit of the
>> caller. The address limit is kept by the syscall wrapper and restored
>> just after the syscall ends.
>>
>> For example, it would mitigation this bug:
>>
>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990
>>
>> Signed-off-by: Thomas Garnier <thgarnie@...gle.com>
>> ---
>> Based on next-20170209
>> ---
>>  include/linux/syscalls.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index 91a740f6b884..a1b6a62a9849 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__));      \
>>         asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))       \
>>         {                                                               \
>> -               long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));  \
>> +               long ret;                                               \
>> +               mm_segment_t fs = get_fs();                             \
>> +               ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__));       \
>> +               set_fs(fs);                                             \
>>                 __MAP(x,__SC_TEST,__VA_ARGS__);                         \
>>                 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));       \
>>                 return ret;                                             \
>> --
>> 2.11.0.483.g087da7b7c-goog
>>
>
> I have a memory of Andy looking at this before, and there was some
> problem with how a bunch of compat code would set fs and then re-call
> the syscall... but I can't quite find the conversation. Andy, do you
> remember the details?
>
> This seems like an entirely reasonable thing to enforce for syscalls,
> though I'm sure there's a gotcha somewhere. :)

This sounds vaguely familiar, but that's about all.

Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely
only used for syscalls and not for other things, so the code should
*work*.  That being said, I think there's room for several
improvements.

1. Why save the old "fs" value?  For that matter, why restore it?
IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end.

2. I'd rather see the mechanism be more general.  If we had, effectively:

asmlinkage long SyS_foo(...) {
  sys_foo();
  verify_pre_usermode_state();
}

and let verify_pre_usermode_state() potentially do more things, we'd
get a more flexible mechanism.  On arches like x86_32, we could save a
decent amount of code size by moving verify_pre_usermode_state() into
prepare_exit_to_usermode(), but that would have to be a per-arch
opt-in.  x86_64 probably would *not* select this due to the fast path
(or it would do it in asm.  hmm.).

3. If this thing gets factored out, then arch code can call it for
non-syscall entries, too.

4. Can we make this configurable?


For x86, a nice implementation might be:

select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE

... in prepare_exit_to_usermode():

verify_pre_usermode_state();  // right at the beginning

... in the asm syscall fast path:

#ifdef CONFIG_VERIFY_PRE_USERMODE_STATE
call verify_pre_usermode_staet
#endif

(or just inline the interesting bit)

--Andy
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.