|
Message-ID: <20170921000901.v7zo4g5edhqqfabm@docker> Date: Wed, 20 Sep 2017 18:09:01 -0600 From: Tycho Andersen <tycho@...ker.com> To: Dave Hansen <dave.hansen@...el.com> Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, kernel-hardening@...ts.openwall.com, Marco Benatto <marco.antonio.780@...il.com>, Juerg Haefliger <juerg.haefliger@...onical.com>, x86@...nel.org Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) On Wed, Sep 20, 2017 at 04:21:15PM -0700, Dave Hansen wrote: > On 09/20/2017 03:34 PM, Tycho Andersen wrote: > >> I really have to wonder whether there are better ret2dir defenses than > >> this. The allocator just seems like the *wrong* place to be doing this > >> because it's such a hot path. > > > > This might be crazy, but what if we defer flushing of the kernel > > ranges until just before we return to userspace? We'd still manipulate > > the prot/xpfo bits for the pages, but then just keep a list of which > > ranges need to be flushed, and do the right thing before we return. > > This leaves a little window between the actual allocation and the > > flush, but userspace would need another thread in its threadgroup to > > predict the next allocation, write the bad stuff there, and do the > > exploit all in that window. > > I think the common case is still that you enter the kernel, allocate a > single page (or very few) and then exit. So, you don't really reduce > the total number of flushes. > > Just think of this in terms of IPIs to do the remote TLB flushes. A CPU > can do roughly 1 million page faults and allocations a second. Say you > have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads. > That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH* > *CPU*. Since we only need to flush when something switches from a userspace to a kernel page or back, hopefully it's not this bad, but point taken. > I think the only thing that will really help here is if you batch the > allocations. For instance, you could make sure that the per-cpu-pageset > lists always contain either all kernel or all user data. Then remap the > entire list at once and do a single flush after the entire list is consumed. Just so I understand, the idea would be that we only flush when the type of allocation alternates, so: kmalloc(..., GFP_KERNEL); kmalloc(..., GFP_KERNEL); /* remap+flush here */ kmalloc(..., GFP_HIGHUSER); /* remap+flush here */ kmalloc(..., GFP_KERNEL); ? Tycho
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.