|
Message-ID: <CAG48ez1vZ5cngEKVtWTL9rz_K8K25b1sMKYrNs+jn4Va3KYucw@mail.gmail.com> Date: Thu, 28 Mar 2019 01:59:45 +0100 From: Jann Horn <jannh@...gle.com> To: Kees Cook <keescook@...omium.org>, "Eric W. Biederman" <ebiederm@...ssion.com> Cc: "Joel Fernandes (Google)" <joel@...lfernandes.org>, LKML <linux-kernel@...r.kernel.org>, Android Kernel Team <kernel-team@...roid.com>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, Michal Hocko <mhocko@...e.com>, Oleg Nesterov <oleg@...hat.com>, "Reshetova, Elena" <elena.reshetova@...el.com> Subject: Re: [PATCH] Convert struct pid count to refcount_t On Thu, Mar 28, 2019 at 1:06 AM Kees Cook <keescook@...omium.org> wrote: > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google) > <joel@...lfernandes.org> wrote: > > > > struct pid's count is an atomic_t field used as a refcount. Use > > refcount_t for it which is basically atomic_t but does additional > > checking to prevent use-after-free bugs. No change in behavior if > > CONFIG_REFCOUNT_FULL=n. > > > > Cc: keescook@...omium.org > > Cc: kernel-team@...roid.com > > Cc: kernel-hardening@...ts.openwall.com > > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org> > > [...] > > diff --git a/kernel/pid.c b/kernel/pid.c > > index 20881598bdfa..2095c7da644d 100644 > > --- a/kernel/pid.c > > +++ b/kernel/pid.c > > @@ -37,7 +37,7 @@ > > #include <linux/init_task.h> > > #include <linux/syscalls.h> > > #include <linux/proc_ns.h> > > -#include <linux/proc_fs.h> > > +#include <linux/refcount.h> > > #include <linux/sched/task.h> > > #include <linux/idr.h> > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid) > > return; > > > > ns = pid->numbers[pid->level].ns; > > - if ((atomic_read(&pid->count) == 1) || > > - atomic_dec_and_test(&pid->count)) { > > + if ((refcount_read(&pid->count) == 1) || > > + refcount_dec_and_test(&pid->count)) { > > Why is this (and the original code) safe in the face of a race against > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I > don't see this code pattern anywhere else in the kernel. Semantically, it doesn't make a difference whether you do this or leave out the "refcount_read(&pid->count) == 1". If you read a 1 from refcount_read(), then you have the only reference to "struct pid", and therefore you want to free it. If you don't get a 1, you have to atomically drop a reference, which, if someone else is concurrently also dropping a reference, may leave you with the last reference (in the case where refcount_dec_and_test() returns true), in which case you still have to take care of freeing it. My guess is that the goal of this is to make the "drop last reference" case a little bit faster by avoiding the cacheline dirtying and the atomic op, at the expense of an extra memory op and branch every time we drop a non-final reference. But that's a pretty low-level optimization, and forking by itself isn't exactly fast... I think the clean thing to do would be to either move this detail into the refcount implementation (if it turns out to actually be valuable in at least a microbenchmark), or just get rid of it. Given the overhead of fork()/clone(), I would be surprised if you could actually measure this effect here. Eric, can you remember the rationale for doing it that way in commit 92476d7fc0326a409ab1d3864a04093a6be9aca7? Am I guessing correctly?
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.