|
Message-ID: <20120311094039.GB26640@openwall.com> Date: Sun, 11 Mar 2012 13:40:39 +0400 From: Solar Designer <solar@...nwall.com> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: Djalal Harouni <tixxdz@...ndz.org>, linux-kernel@...r.kernel.org, kernel-hardening@...ts.openwall.com, Andrew Morton <akpm@...ux-foundation.org>, Al Viro <viro@...iv.linux.org.uk>, Alexey Dobriyan <adobriyan@...il.com>, "Eric W. Biederman" <ebiederm@...ssion.com>, Vasiliy Kulikov <segoon@...nwall.com>, Kees Cook <keescook@...omium.org>, WANG Cong <xiyou.wangcong@...il.com>, James Morris <james.l.morris@...cle.com>, Oleg Nesterov <oleg@...hat.com>, linux-security-module@...r.kernel.org, linux-fsdevel@...r.kernel.org, Alan Cox <alan@...rguk.ukuu.org.uk>, Greg KH <gregkh@...uxfoundation.org>, Ingo Molnar <mingo@...e.hu>, Stephen Wilson <wilsons@...rt.ca>, "Jason A. Donenfeld" <Jason@...c4.com> Subject: Re: [PATCH 1/9] exec: add a global execve counter On Sat, Mar 10, 2012 at 04:12:20PM -0800, Linus Torvalds wrote: > On Sat, Mar 10, 2012 at 3:25 PM, Djalal Harouni <tixxdz@...ndz.org> wrote: > > > > Given that consideration this patch introduces two counters: > > A global atomic execve counter that will be incremented on every > > do_execve_common() call, and an atomic exec_id member for the task_struct. > > This seems horribly expensive on most 32-bit architectures, including > very much x86-32. That atomic64_inc_return() is not cheap. It's > possible that it's basically an impossible operation to do atomically > on certain platforms, causing it to use some random spinlock instead. I think these are _relatively_ cheap considering that it's execve(). If we can do e.g. 10 million of atomic64_inc_return()'s per second (and I think we can do a lot more, although I did not benchmark this), but only 10000 of execve()'s per second, then the performance impact is 0.1%. I admit that 0.1% is significant, but I used worst-case guesstimates here; the actual number might be more like 0.01% (50 million vs. 5 thousand) or even 0.002% (200 million vs. 4 thousand). (200 million is 15 CPU cycles at 3 GHz, which may be a reasonable optimistic estimate. 4 thousand is a realistic execve() speed for dynamically-linked programs.) > I wonder if we couldn't make something much cheaper, since we don't > actually care about it being globally incrementing, we just care about > it being globally unique. IOW, it could easily be a 56-bit per-cpu > counter along with the CPU number in the high bits or something like > that. Avoiding the whole atomicity issue, and thus avoiding the main > reason those things are really expensive. > > IOW, something like > > cpu = get_cpu(); > .. increment percpu 64-bit counter .. > id = counter * MAX_CPUS + cpu; > put_cpu(cpu); > > or equivalent would seem to be a potentially much cheaper approach. This makes sense to me. We just need to ensure that the per-CPU counter is still large enough that it won't wrap. 56 bits is enough (considering that the attack has to run on a single CPU of the target system, unlike e.g. an attack on a cipher with a 56-bit key could be), but there's little room for reducing this further (such as to support many more than 256 CPUs in a system). So if we use this approach, maybe we should simply keep a 64-bit per-CPU counter and a 32-bit CPU number, for a 96-bit id. This may even be faster on execve() (no need to combine the two values into one). I don't know which one of these approaches has lower overhead on current systems. My _guess_ is that atomic64_inc_return() may well be faster in many cases. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.