musl - Re: [PATCH] replace a mfence instruction by an xchg instruction

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150816151616.GO31018@brightrain.aerifal.cx>
Date: Sun, 16 Aug 2015 11:16:17 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] replace a mfence instruction by an xchg
 instruction

On Sun, Aug 16, 2015 at 02:42:33PM +0200, Jens Gustedt wrote:
> Hello,
> 
> Am Samstag, den 15.08.2015, 19:28 -0400 schrieb Rich Felker:
> > On Sat, Aug 15, 2015 at 11:01:40PM +0200, Jens Gustedt wrote:
> > > Am Samstag, den 15.08.2015, 16:17 -0400 schrieb Rich Felker:
> > > > On Sat, Aug 15, 2015 at 08:51:41AM +0200, Jens Gustedt wrote:
> > > > > according to the wisdom of the Internet, e.g
> > > > > 
> > > > > https://peeterjoot.wordpress.com/2009/12/04/intel-memory-ordering-fence-instructions-and-atomic-operations/
> > > > > 
> > > > > a mfence instruction is about 3 times slower than an xchg instruction.
> > > > 
> > > > Uhg, then why does this instruction even exist if it does less and
> > > > does it slower?
> > > 
> > > Because they do different things ?)
> > > 
> > > mfence is to synchronize all memory, xchg, at least at a first glance,
> > > only one word.
> > 
> > No, any lock-prefixed instruction, or xchg which has a builtin lock,
> > fully orders all memory accesses. Essentially it contains a builtin
> > mfence.
> 
> Hm, I think mfence does a bit more than that. The three fence
> instructions were introduced when they invented the asynchronous
> ("non-temporal") move instructions that came with sse.
> 
> I don't think that "lock" instructions synchronize with these
> asynchronous moves, so the two (lock instructions and fences) are just
> different types of animals. And this answers perhaps your question
> up-thread, why there is actually something like mfence.

The relevant text seems to be the Intel manual, Vol 3A, 8.2.2 Memory
Ordering in P6 and More Recent Processor Families:

----------------------------------------------------------------------
Reads are not reordered with other reads.

Writes are not reordered with older reads.

Writes to memory are not reordered with other writes, with the
following exceptions:
—   writes executed with the CLFLUSH instruction;
—   streaming stores (writes) executed with the non-temporal move
instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD); and
—   string operations (see Section 8.2.4.1).

Reads may be reordered with older writes to different locations but
not with older writes to the same location. 

Reads or writes cannot be reordered with I/O instructions, locked
instructions, or serializing instructions.

Reads cannot pass earlier LFENCE and MFENCE instructions.

Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.

LFENCE instructions cannot pass earlier reads.

SFENCE instructions cannot pass earlier writes.

MFENCE instructions cannot pass earlier reads or writes
----------------------------------------------------------------------

See page 330, http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf

So mfence seems to be weaker than lock-prefixed instructions in terms
of the ordering it imposes (lock-prefixed instructions forbid
reordering and also have a total ordering across all cores).

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.