kernel-hardening - Re: [PATCH] slub: Introduce CONFIG_SLUB_RCU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+YcBeshE811w5KSyYpBqaQ3S_-aKanOGZcHCQvHWHc4Tg@mail.gmail.com>
Date: Mon, 11 Sep 2023 11:50:19 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrey Ryabinin <ryabinin.a.a@...il.com>, Christoph Lameter <cl@...ux.com>, 
	Pekka Enberg <penberg@...nel.org>, David Rientjes <rientjes@...gle.com>, 
	Joonsoo Kim <iamjoonsoo.kim@....com>, Vlastimil Babka <vbabka@...e.cz>, 
	Alexander Potapenko <glider@...gle.com>, Andrey Konovalov <andreyknvl@...il.com>, 
	Vincenzo Frascino <vincenzo.frascino@....com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Hyeonggon Yoo <42.hyeyoo@...il.com>, 
	kasan-dev@...glegroups.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	linux-hardening@...r.kernel.org, kernel-hardening@...ts.openwall.com
Subject: Re: [PATCH] slub: Introduce CONFIG_SLUB_RCU_DEBUG

On Mon, 28 Aug 2023 at 16:40, Jann Horn <jannh@...gle.com> wrote:
>
> On Sat, Aug 26, 2023 at 5:32 AM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > On Fri, 25 Aug 2023 at 23:15, Jann Horn <jannh@...gle.com> wrote:
> > > Currently, KASAN is unable to catch use-after-free in SLAB_TYPESAFE_BY_RCU
> > > slabs because use-after-free is allowed within the RCU grace period by
> > > design.
> > >
> > > Add a SLUB debugging feature which RCU-delays every individual
> > > kmem_cache_free() before either actually freeing the object or handing it
> > > off to KASAN, and change KASAN to poison freed objects as normal when this
> > > option is enabled.
> > >
> > > Note that this creates a 16-byte unpoisoned area in the middle of the
> > > slab metadata area, which kinda sucks but seems to be necessary in order
> > > to be able to store an rcu_head in there without triggering an ASAN
> > > splat during RCU callback processing.
> >
> > Nice!
> >
> > Can't we unpoision this rcu_head right before call_rcu() and repoison
> > after receiving the callback?
>
> Yeah, I think that should work. It looks like currently
> kasan_unpoison() is exposed in include/linux/kasan.h but
> kasan_poison() is not, and its inline definition probably means I
> can't just move it out of mm/kasan/kasan.h into include/linux/kasan.h;
> do you have a preference for how I should handle this? Hmm, and it
> also looks like code outside of mm/kasan/ anyway wouldn't know what
> are valid values for the "value" argument to kasan_poison().
> I also have another feature idea that would also benefit from having
> something like kasan_poison() available in include/linux/kasan.h, so I
> would prefer that over adding another special-case function inside
> KASAN for poisoning this piece of slab metadata...
>
> I guess I could define a wrapper around kasan_poison() in
> mm/kasan/generic.c that uses a new poison value for "some other part
> of the kernel told us to poison this area", and then expose that
> wrapper with a declaration in include/mm/kasan.h? Something like:
>
> void kasan_poison_outline(const void *addr, size_t size, bool init)
> {
>   kasan_poison(addr, size, KASAN_CUSTOM, init);
> }

Looks reasonable.

> > What happens on cache destruction?
> > Currently we purge quarantine on cache destruction to be able to
> > safely destroy the cache. I suspect we may need to somehow purge rcu
> > callbacks as well, or do something else.
>
> Ooh, good point, I hadn't thought about that... currently
> shutdown_cache() assumes that all the objects have already been freed,
> then puts the kmem_cache on a list for
> slab_caches_to_rcu_destroy_workfn(), which then waits with an
> rcu_barrier() until the slab's pages are all gone.

I guess this is what the test robot found as well.

> Luckily kmem_cache_destroy() is already a sleepable operation, so
> maybe I should just slap another rcu_barrier() in there for builds
> with this config option enabled... I think that should be fine for an
> option mostly intended for debugging.

This is definitely the simplest option.
I am a bit concerned about performance if massive cache destruction
happens (e.g. maybe during destruction of a set of namespaces for a
container). Net namespace is slow to destroy for this reason IIRC,
there were some optimizations to batch rcu synchronization. And now we
are adding more.
But I don't see any reasonable faster option as well.
So I guess let's do this now and optimize later (or not).
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.