|
Message-ID: <CALCETrVS6J2_GPWBoxUFUhjdyce0TBdroA+FjfdNywr9_k6hew@mail.gmail.com> Date: Thu, 9 Feb 2017 18:42:34 -0800 From: Andy Lutomirski <luto@...capital.net> To: Thomas Garnier <thgarnie@...gle.com> Cc: Kees Cook <keescook@...omium.org>, Dave Hansen <dave.hansen@...el.com>, Arnd Bergmann <arnd@...db.de>, René Nyffenegger <mail@...enyffenegger.ch>, Stephen Bates <stephen.bates@...s.com>, Jeff Moyer <jmoyer@...hat.com>, Milosz Tanski <milosz@...in.com>, Linux API <linux-api@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, "kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>, Will Deacon <will.deacon@....com>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org> Subject: Re: [RFC] syscalls: Restore address limit after a syscall On Thu, Feb 9, 2017 at 3:41 PM, Thomas Garnier <thgarnie@...gle.com> wrote: > On Thu, Feb 9, 2017 at 3:05 PM, Andy Lutomirski <luto@...capital.net> wrote: >> On Thu, Feb 9, 2017 at 11:31 AM, Kees Cook <keescook@...omium.org> wrote: >>> On Thu, Feb 9, 2017 at 10:33 AM, Thomas Garnier <thgarnie@...gle.com> wrote: >>>> This patch prevents a syscall to modify the address limit of the >>>> caller. The address limit is kept by the syscall wrapper and restored >>>> just after the syscall ends. >>>> >>>> For example, it would mitigation this bug: >>>> >>>> - https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >>>> >>>> Signed-off-by: Thomas Garnier <thgarnie@...gle.com> >>>> --- >>>> Based on next-20170209 >>>> --- >>>> include/linux/syscalls.h | 5 ++++- >>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h >>>> index 91a740f6b884..a1b6a62a9849 100644 >>>> --- a/include/linux/syscalls.h >>>> +++ b/include/linux/syscalls.h >>>> @@ -198,7 +198,10 @@ extern struct trace_event_functions exit_syscall_print_funcs; >>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ >>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ >>>> { \ >>>> - long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ >>>> + long ret; \ >>>> + mm_segment_t fs = get_fs(); \ >>>> + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ >>>> + set_fs(fs); \ >>>> __MAP(x,__SC_TEST,__VA_ARGS__); \ >>>> __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ >>>> return ret; \ >>>> -- >>>> 2.11.0.483.g087da7b7c-goog >>>> >>> >>> I have a memory of Andy looking at this before, and there was some >>> problem with how a bunch of compat code would set fs and then re-call >>> the syscall... but I can't quite find the conversation. Andy, do you >>> remember the details? >>> >>> This seems like an entirely reasonable thing to enforce for syscalls, >>> though I'm sure there's a gotcha somewhere. :) >> >> This sounds vaguely familiar, but that's about all. >> >> Anyway, it seems reasonable that the SyS_foobar wrappers are genuinely >> only used for syscalls and not for other things, so the code should >> *work*. That being said, I think there's room for several >> improvements. >> >> 1. Why save the old "fs" value? For that matter, why restore it? >> IOW, I'd rather see BUG_ON(get_fs() != USER_DS) at the end. >> > > I guess that make sense in the wrapper. > >> 2. I'd rather see the mechanism be more general. If we had, effectively: >> >> asmlinkage long SyS_foo(...) { >> sys_foo(); >> verify_pre_usermode_state(); >> } >> >> and let verify_pre_usermode_state() potentially do more things, we'd >> get a more flexible mechanism. On arches like x86_32, we could save a >> decent amount of code size by moving verify_pre_usermode_state() into >> prepare_exit_to_usermode(), but that would have to be a per-arch >> opt-in. x86_64 probably would *not* select this due to the fast path >> (or it would do it in asm. hmm.). >> > > I will look into that. I like this design better. > >> 3. If this thing gets factored out, then arch code can call it for >> non-syscall entries, too. >> > > Yes, it makes sense. > >> 4. Can we make this configurable? >> >> >> For x86, a nice implementation might be: >> >> select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE >> >> ... in prepare_exit_to_usermode(): >> >> verify_pre_usermode_state(); // right at the beginning >> >> ... in the asm syscall fast path: >> >> #ifdef CONFIG_VERIFY_PRE_USERMODE_STATE >> call verify_pre_usermode_staet >> #endif >> >> (or just inline the interesting bit) >> > > So by default it is in the wrapper. If selected, an architecture can > disable the wrapper put it in the best places. Understood correctly? Sounds good to me. Presumably the result should go through -mm. Want to cc: akpm and linux-arch@ on the next version? I've also cc'd arm and s390 folks -- those are the other arches that try to be on top of hardening.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.