Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOMGZ=HQxkFYhHuF0mDVcwO7m4z34e_PKb6jHGVC3Jwc-mhfXQ@mail.gmail.com>
Date: Fri, 28 Oct 2016 11:27:17 +0200
From: Vegard Nossum <vegard.nossum@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Pavel Machek <pavel@....cz>, Ingo Molnar <mingo@...nel.org>, Kees Cook <keescook@...omium.org>, 
	Arnaldo Carvalho de Melo <acme@...hat.com>, kernel list <linux-kernel@...r.kernel.org>, 
	Ingo Molnar <mingo@...hat.com>, Alexander Shishkin <alexander.shishkin@...ux.intel.com>, 
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>
Subject: Re: rowhammer protection [was Re: Getting interrupt every million
 cache misses]

On 28 October 2016 at 11:04, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Oct 28, 2016 at 10:50:39AM +0200, Pavel Machek wrote:
>> On Fri 2016-10-28 09:07:01, Ingo Molnar wrote:
>> >
>> > * Pavel Machek <pavel@....cz> wrote:
>> >
>> > > +static void rh_overflow(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs)
>> > > +{
>> > > + u64 *ts = this_cpu_ptr(&rh_timestamp); /* this is NMI context */
>> > > + u64 now = ktime_get_mono_fast_ns();
>> > > + s64 delta = now - *ts;
>> > > +
>> > > + *ts = now;
>> > > +
>> > > + /* FIXME msec per usec, reverse logic? */
>> > > + if (delta < 64 * NSEC_PER_MSEC)
>> > > +         mdelay(56);
>> > > +}
>> >
>> > I'd suggest making the absolute delay sysctl tunable, because 'wait 56 msecs' is
>> > very magic, and do we know it 100% that 56 msecs is what is needed
>> > everywhere?
>>
>> I agree this needs to be tunable (and with the other suggestions). But
>> this is actually not the most important tunable: the detection
>> threshold (rh_attr.sample_period) should be way more important.
>
> So being totally ignorant of the detail of how rowhammer abuses the DDR
> thing, would it make sense to trigger more often and delay shorter? Or
> is there some minimal delay required for things to settle or something.

Would it make sense to sample the counter on context switch, do some
accounting on a per-task cache miss counter, and slow down just the
single task(s) with a too high cache miss rate? That way there's no
global slowdown (which I assume would be the case here). The task's
slice of CPU would have to be taken into account because otherwise you
could have multiple cooperating tasks that each escape the limit but
taken together go above it.


Vegard

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.