|
Message-ID: <20150728173141.GV16376@brightrain.aerifal.cx> Date: Tue, 28 Jul 2015 13:31:41 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: What's left for 1.1.11 release? On Tue, Jul 28, 2015 at 05:33:18PM +0300, Alexander Monakov wrote: > > > and stdio locks too, but it's only been observed in malloc. > > > Since there don't seem to be any performance-relevant uses of a_store > > > that don't actually need the proper barrier, I think we have to just > > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store > > > and live with the loss of performance. > > > > How about using a xchg as instruction? This would perhaps "waste" a > > register, but that sort of optimization should not be critical in the > > vicinity of code that needs memory synchronization, anyhow. > > xchg is what compilers use in lieu of mfence, but Rich's preference for 'lock > orl' on the top of the stack stems from the idea that locking on the store > destination is not desired here (you might not even have the corresponding > line in the cache), so it might be better to have the store land in the store > buffers, and do a serializing 'lock orl' on the cache line you have anyhow. I did a quick run of my old malloc stress test with both approaches. The outputs are not sufficiently stable to gather a lot, but on my machine, there seems to be no loss in performance with the stack approach and a 1-5% loss from using xchg to do the store. I'd like to have a better measurement to confirm this, but being that my measurements so far agree with the theoretical prediction, I think I'll just go with the stack approach for now. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.