|
Message-ID: <20200625085745.GD117543@hirez.programming.kicks-ass.net> Date: Thu, 25 Jun 2020 10:57:45 +0200 From: Peter Zijlstra <peterz@...radead.org> To: Nick Desaulniers <ndesaulniers@...gle.com> Cc: Sami Tolvanen <samitolvanen@...gle.com>, Masahiro Yamada <masahiroy@...nel.org>, Will Deacon <will@...nel.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Paul E. McKenney" <paulmck@...nel.org>, Kees Cook <keescook@...omium.org>, clang-built-linux <clang-built-linux@...glegroups.com>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, linux-arch <linux-arch@...r.kernel.org>, Linux ARM <linux-arm-kernel@...ts.infradead.org>, Linux Kbuild mailing list <linux-kbuild@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, linux-pci@...r.kernel.org, "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org> Subject: Re: [PATCH 00/22] add support for Clang LTO On Thu, Jun 25, 2020 at 10:24:33AM +0200, Peter Zijlstra wrote: > On Thu, Jun 25, 2020 at 10:03:13AM +0200, Peter Zijlstra wrote: > > I'm sure Will will respond, but the basic issue is the trainwreck C11 > > made of dependent loads. > > > > Anyway, here's a link to the last time this came up: > > > > https://lore.kernel.org/linux-arm-kernel/20171116174830.GX3624@linux.vnet.ibm.com/ > > Another good read: > > https://lore.kernel.org/lkml/20150520005510.GA23559@linux.vnet.ibm.com/ > > and having (partially) re-read that, I now worry intensily about things > like latch_tree_find(), cyc2ns_read_begin, __ktime_get_fast_ns(). > > It looks like kernel/time/sched_clock.c uses raw_read_seqcount() which > deviates from the above patterns by, for some reason, using a primitive > that includes an extra smp_rmb(). > > And this is just the few things I could remember off the top of my head, > who knows what else is out there. As an example, let us consider __ktime_get_fast_ns(), the critical bit is: seq = raw_read_seqcount_latch(&tkf->seq); tkr = tkf->base + (seq & 0x01); now = tkr->base; And we hard rely on that being a dependent load, so: LOAD seq, (tkf->seq) LOAD tkr, tkf->base AND seq, 1 MUL seq, sizeof(tk_read_base) ADD tkr, seq LOAD now, (tkr->base) Such that we obtain 'now' as a direct dependency on 'seq'. This ensures the loads are ordered. A compiler can wreck this by translating it into something like: LOAD seq, (tkf->seq) LOAD tkr, tkf->base AND seq, 1 CMP seq, 0 JE 1f ADD tkr, sizeof(tk_read_base) 1: LOAD now, (tkr->base) Because now the machine can speculate and load now before seq, breaking the ordering.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.