|
Message-ID: <20151105025433.GW8645@brightrain.aerifal.cx> Date: Wed, 4 Nov 2015 21:54:33 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH 2/3] i386/memset: do not fetch fill char from memory again On Mon, Oct 12, 2015 at 08:30:33PM +0200, Denys Vlasenko wrote: > shl $16,%edx > mov 8(%esp),%dl > mov 8(%esp),%dh > > The above code has two register merge stalls, and it goes to load unit > to fetch the data. I don't know what's worse. Both are not pleasant. Do you have measurements to back this? > Replace them with IMUL. It has ~3 cycle latency, but no stalls. While we probably don't need to care about ancient chips like 486 or original Pentium for performance purposes (altho maybe Quark?), I'd rather not do anything that would make performance catastrophically worse on them unless it actually has significant (measurable) benefit for modern systems. The code as is was written to be non-hostile to systems where imul has some nontrivial cost. > Move it a bit up to hide its latency. The movement puts it before the branch which exits early, which is probably a huge performance loss on old cpus. Of course even better than evidence that your code helps a lot on modern cpus would be evidence that it doesn't hurt at all on old ones. Anyone have a 486 or 586 lying around to run timings on? I suppose I could see if my old K6 still boots... Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.