|
Message-ID: <20190328143738.GA261521@google.com> Date: Thu, 28 Mar 2019 10:37:38 -0400 From: Joel Fernandes <joel@...lfernandes.org> To: Jann Horn <jannh@...gle.com> Cc: Kees Cook <keescook@...omium.org>, "Eric W. Biederman" <ebiederm@...ssion.com>, LKML <linux-kernel@...r.kernel.org>, Android Kernel Team <kernel-team@...roid.com>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, Michal Hocko <mhocko@...e.com>, Oleg Nesterov <oleg@...hat.com>, "Reshetova, Elena" <elena.reshetova@...el.com> Subject: Re: [PATCH] Convert struct pid count to refcount_t On Thu, Mar 28, 2019 at 03:57:44AM +0100, Jann Horn wrote: > On Thu, Mar 28, 2019 at 3:34 AM Joel Fernandes <joel@...lfernandes.org> wrote: > > On Thu, Mar 28, 2019 at 01:59:45AM +0100, Jann Horn wrote: > > > On Thu, Mar 28, 2019 at 1:06 AM Kees Cook <keescook@...omium.org> wrote: > > > > On Wed, Mar 27, 2019 at 7:53 AM Joel Fernandes (Google) > > > > <joel@...lfernandes.org> wrote: > > > > > > > > > > struct pid's count is an atomic_t field used as a refcount. Use > > > > > refcount_t for it which is basically atomic_t but does additional > > > > > checking to prevent use-after-free bugs. No change in behavior if > > > > > CONFIG_REFCOUNT_FULL=n. > > > > > > > > > > Cc: keescook@...omium.org > > > > > Cc: kernel-team@...roid.com > > > > > Cc: kernel-hardening@...ts.openwall.com > > > > > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org> > > > > > [...] > > > > > diff --git a/kernel/pid.c b/kernel/pid.c > > > > > index 20881598bdfa..2095c7da644d 100644 > > > > > --- a/kernel/pid.c > > > > > +++ b/kernel/pid.c > > > > > @@ -37,7 +37,7 @@ > > > > > #include <linux/init_task.h> > > > > > #include <linux/syscalls.h> > > > > > #include <linux/proc_ns.h> > > > > > -#include <linux/proc_fs.h> > > > > > +#include <linux/refcount.h> > > > > > #include <linux/sched/task.h> > > > > > #include <linux/idr.h> > > > > > > > > > > @@ -106,8 +106,8 @@ void put_pid(struct pid *pid) > > > > > return; > > > > > > > > > > ns = pid->numbers[pid->level].ns; > > > > > - if ((atomic_read(&pid->count) == 1) || > > > > > - atomic_dec_and_test(&pid->count)) { > > > > > + if ((refcount_read(&pid->count) == 1) || > > > > > + refcount_dec_and_test(&pid->count)) { > > > > > > > > Why is this (and the original code) safe in the face of a race against > > > > get_pid()? i.e. shouldn't this only use refcount_dec_and_test()? I > > > > don't see this code pattern anywhere else in the kernel. > > > > > > Semantically, it doesn't make a difference whether you do this or > > > leave out the "refcount_read(&pid->count) == 1". If you read a 1 from > > > refcount_read(), then you have the only reference to "struct pid", and > > > therefore you want to free it. If you don't get a 1, you have to > > > atomically drop a reference, which, if someone else is concurrently > > > also dropping a reference, may leave you with the last reference (in > > > the case where refcount_dec_and_test() returns true), in which case > > > you still have to take care of freeing it. > > > > Also, based on Kees comment, I think it appears to me that get_pid and > > put_pid can race in this way in the original code right? > > > > get_pid put_pid > > > > atomic_dec_and_test returns 1 > > This can't happen. get_pid() can only be called on an existing > reference. If you are calling get_pid() on an existing reference, and > someone else is dropping another reference with put_pid(), then when > both functions start running, the refcount must be at least 2. Sigh, you are right. Ok. I was quite tired last night when I wrote this. Obviously, I should have waited a bit and thought it through. Kees can you describe more the race you had in mind? > > atomic_inc > > kfree > > > > deref pid /* boom */ > > ------------------------------------------------- > > > > I think get_pid needs to call atomic_inc_not_zero() and put_pid should > > not test for pid->count == 1 as condition for freeing, but rather just do > > atomic_dec_and_test. So something like the following diff. (And I see a > > similar pattern used in drivers/net/mac.c) > > get_pid() can only be called when you already have a refcounted > reference; in other words, when the reference count is at least one. > The lifetime management of struct pid differs from the lifetime > management of most other objects in the kernel; the usual patterns > don't quite apply here. > > Look at put_pid(): When the refcount has reached zero, there is no RCU > grace period (unlike most other objects with RCU-managed lifetimes). > Instead, free_pid() has an RCU grace period *before* it invokes > delayed_put_pid() to drop a reference; and free_pid() is also the > function that removes a PID from the namespace's IDR, and it is used > by __change_pid() when a task loses its reference on a PID. > > In other words: Most refcounted objects with RCU guarantee that the > object waits for a grace period after its refcount has reached zero; > and during the grace period, the refcount is zero and you're not > allowed to increment it again. Can you give an example of this "most refcounted objects with RCU" usecase? I could not find any good examples of such. I want to document this pattern and possibly submit to Documentation/RCU. > But for struct pid, the guarantee is > instead that there is an RCU grace period after it has been removed > from the IDRs and the task, and during the grace period, refcounting > is guaranteed to still work normally. Ok, thanks. Here I think in scrappy but simple pseudo code form, the struct pid flow is something like (replaced "pid" with data"); get_data: atomic_inc(data->refcount); some_user_of_data: rcu_read_lock(); From X, obtain a ptr to data using rcu_dereference. get_data(data); rcu_read_unlock(); free_data: remove all references to data in all places in X call_rcu(put_data) put_data: if (atomic_dec_and_test(data->refcount)) { free(data); } create_data: data = alloc(..) atomic_set(data->refcount, 1); set pointers to data in X. > > pud_pid to avoid such a race. > > > > ---8<----------------------- > > > > diff --git a/include/linux/pid.h b/include/linux/pid.h > > index 8cb86d377ff5..3d79834e3180 100644 > > --- a/include/linux/pid.h > > +++ b/include/linux/pid.h > > @@ -69,8 +69,8 @@ extern struct pid init_struct_pid; > > > > static inline struct pid *get_pid(struct pid *pid) > > { > > - if (pid) > > - refcount_inc(&pid->count); > > + if (!pid || !refcount_inc_not_zero(&pid->count)) > > + return NULL; > > return pid; > > } > > Nope, this is wrong. Once the refcount is zero, the object goes away, > refcount_inc_not_zero() makes no sense here. Yeah ok, I think what you meant here is that references to the object from all places go away before the grace period starts, so a get_pid on an object with refcount of zero is impossible since there's no way to *get* to that object after the grace-period ends. So, yes you are right that refcount_inc is all that's needed. Also note to the on looker, the original patch I sent is not wrong, that still applies and is correct. We are just discussing here any possible issues with the *existing* code. thanks! - Joel
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.