|
Message-ID: <CAHk-=whg4aVxA7LFAUFCzOn78_7TL1CPo+esPKgN5JTHy8H-Rg@mail.gmail.com> Date: Wed, 10 Mar 2021 13:14:24 -0800 From: Linus Torvalds <torvalds@...ux-foundation.org> To: Alexey Gladkov <gladkov.alexey@...il.com> Cc: LKML <linux-kernel@...r.kernel.org>, io-uring <io-uring@...r.kernel.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Linux Containers <containers@...ts.linux-foundation.org>, Linux-MM <linux-mm@...ck.org>, Alexey Gladkov <legion@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, Christian Brauner <christian.brauner@...ntu.com>, "Eric W . Biederman" <ebiederm@...ssion.com>, Jann Horn <jannh@...gle.com>, Jens Axboe <axboe@...nel.dk>, Kees Cook <keescook@...omium.org>, Oleg Nesterov <oleg@...hat.com> Subject: Re: [PATCH v8 3/8] Use atomic_t for ucounts reference counting On Wed, Mar 10, 2021 at 4:01 AM Alexey Gladkov <gladkov.alexey@...il.com> wrote: > > > +/* 127: arbitrary random number, small enough to assemble well */ > +#define refcount_zero_or_close_to_overflow(ucounts) \ > + ((unsigned int) atomic_read(&ucounts->count) + 127u <= 127u) > + > +struct ucounts *get_ucounts(struct ucounts *ucounts) > +{ > + if (ucounts) { > + if (refcount_zero_or_close_to_overflow(ucounts)) { > + WARN_ONCE(1, "ucounts: counter has reached its maximum value"); > + return NULL; > + } > + atomic_inc(&ucounts->count); > + } > + return ucounts; Side note: you probably should just make the limit be the "oh, the count overflows into the sign bit". The reason the page cache did that tighter thing is that it actually has _two_ limits: - the "try_get_page()" thing uses the sign bit as a "uhhuh, I've now used up half of the available reference counting bits, and I will refuse to use any more". This is basically your "get_ucounts()" function. It's a "I want a refcount, but I'm willing to deal with failures". - the page cache has a _different_ set of "I need to unconditionally get a refcount, and I can *not* deal with failures". This is basically the traditional "get_page()", which is only used in fairly controlled places, and should never be something that can overflow. And *that* special code then uses that "zero_or_close_to_overflow()" case as a "doing a get_page() in this situation is very very wrong". This is purely a debugging feature used for a VM_BUG_ON() (that has never triggered, as far as I know). For your ucounts situation, you don't have that second case at all, so you have no reason to ever allow the count to even get remotely close to overflowing. A reference count being within 128 counts of overflow (when we're talking a 32-bit count) is basically never a good idea. It means that you are way too close to the limit, and there's a risk that lots of concurrent people all first see an ok value, and then *all* decide to do the increment, and then you're toast. In contrast, if you use the sign bit as a "ok, let's stop incrementing", the fact that your "overflow" test and the increment aren't atomic really isn't a big deal. (And yes, you could use a cmpxchg to *make* the overflow test atomic, but it's often much much more expensive, so..) Linus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.