|
Message-ID: <20150210223648.GN23507@brightrain.aerifal.cx> Date: Tue, 10 Feb 2015 17:36:48 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations On Tue, Feb 10, 2015 at 04:37:56PM -0500, Rich Felker wrote: > On Tue, Feb 10, 2015 at 10:08:29PM +0100, Denys Vlasenko wrote: > > On Tue, Feb 10, 2015 at 9:50 PM, Rich Felker <dalias@...c.org> wrote: > > > On Tue, Feb 10, 2015 at 06:30:56PM +0100, Denys Vlasenko wrote: > > >> "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use > > >> 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. > > >> [...] > > > > > > Do you want to go ahead with these patches as-is, or consider some of > > > the other ideas we discussed off-list like avoiding the 64-bit imul > > > entirely in the small-n case? If you think that's easy as another > > > incremental change I'll go ahead with these > > > > I think you can apply these patches without waiting > > for potential future improvements. > > OK. Based on some casual testing on my Celeron 847: > > - For small sizes, your patches make significant improvement, 20-30%. > > - For rep stosq path, the improvement is minimal (roughly 1-2 cycles). > > - Using 32-bit imul instead of 64-bit makes no difference at all. > > I'll review the patches again for correctness, but so far they look > good, and it doesn't look like these are things we'd want to back out > or rewrite for subsequent improvements anyway. > > Thanks! One more trivial change I might do: since the non-rep-stosq path is faster for small sizes, changing the jb 1f to jbe 1f significantly improves 16-byte memsets with no additional code changes. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.