Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150728173141.GV16376@brightrain.aerifal.cx>
Date: Tue, 28 Jul 2015 13:31:41 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: What's left for 1.1.11 release?

On Tue, Jul 28, 2015 at 05:33:18PM +0300, Alexander Monakov wrote:
> > > and stdio locks too, but it's only been observed in malloc.
> > > Since there don't seem to be any performance-relevant uses of a_store
> > > that don't actually need the proper barrier, I think we have to just
> > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store
> > > and live with the loss of performance.
> > 
> > How about using a xchg as instruction? This would perhaps "waste" a
> > register, but that sort of optimization should not be critical in the
> > vicinity of code that needs memory synchronization, anyhow.
> 
> xchg is what compilers use in lieu of mfence, but Rich's preference for 'lock
> orl' on the top of the stack stems from the idea that locking on the store
> destination is not desired here (you might not even have the corresponding
> line in the cache), so it might be better to have the store land in the store
> buffers, and do a serializing 'lock orl' on the cache line you have anyhow.

I did a quick run of my old malloc stress test with both approaches.
The outputs are not sufficiently stable to gather a lot, but on my
machine, there seems to be no loss in performance with the stack
approach and a 1-5% loss from using xchg to do the store. I'd like to
have a better measurement to confirm this, but being that my
measurements so far agree with the theoretical prediction, I think
I'll just go with the stack approach for now.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.