musl - Re: [PATCH] replace a mfence instruction by an xchg instruction

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1439741801.9803.35.camel@inria.fr>
Date: Sun, 16 Aug 2015 18:16:41 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] replace a mfence instruction by an xchg
 instruction

Am Sonntag, den 16.08.2015, 11:58 -0400 schrieb Rich Felker:
> On Sun, Aug 16, 2015 at 05:50:21PM +0200, Jens Gustedt wrote:
> > > See page 330, http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf
> > > 
> > > So mfence seems to be weaker than lock-prefixed instructions in terms
> > > of the ordering it imposes (lock-prefixed instructions forbid
> > > reordering and also have a total ordering across all cores).
> > 
> > Yes, it says so on page 8-26 that the fences are definitively not
> > serializing instructions.
> > 
> > (But what I tried to show in my previous mail still holds, the
> > instruction latency itself plays a big part in the efficiency of these
> > instructions.)
> 
> I wasn't trying to contradict anything you've said, just expressing
> the absurdity of mfence being slower than lock-prefixed instructions,
> since it's a strictly-weaker operation.

Yes, I got that :)

One argument that we neglected for the moment, is the impact on other
threads/cores. Even if such an mfence instruction may be more
expensive for the thread that issues it, it imposes less constraints
to other threads. Maybe overall this could be win?

> > I read all of that as:
> > 
> >  - mfence can be used to achieve acq_rel ordering
> >  - none of the fences can be use to achieve seq_cst ordering
> 
> By this you mean that only lock-prefixed instructions impose a total
> order across all cores?

Plus these very expensive complete serializing instructions that are
listed in the manual.

> > Wasn't the idea that all atomic.h functions implement sequential
> > consistency?
> 
> Yes, that's the intent, but I don't want to introduce 'major'
> performance regressions fixing 'minor' failures to be seq_cst if
> there's no observable misbehavior in the code using them.

Misbehavior here is really hard to track down. Especially having an
application that changes behavior if it is not guaranteed seq_cst is
probably quite difficult to observe.

> Still it
> would be nice to know whether such failures still exist, and if so
> where, so we can eventually clean this up.

Replacing "mfence" by "lock ; orl $0,(%%rsp)" would provide us with
security by not compromising performance :)

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.