Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOb3iui7DKTzqksZ0MW8ga1Ji4sa_QVZ7T-niFDARiRVZ_=LAg@mail.gmail.com>
Date: Wed, 25 Feb 2015 15:54:31 +0800
From: 邓尧 <torshie@...il.com>
To: musl@...ts.openwall.com
Subject: Re: x86[_64] memset and rep stos

I'm not an expert on micro optimization, but why not use a dynamic
routine selection system which would select the optimal routine for a
given CPU during program initialization. The routine selection
algorithm could simply be a predefined static table look up.
IMO, only very small number of functions (like memset, memcpy) would
benefit from such a system, so no code size overhead to worry about.

On Wed, Feb 25, 2015 at 2:12 PM, Rich Felker <dalias@...c.org> wrote:
> Doing some timings on the new proposed memset code, I found it was
> pathologically slow on my Atom D510 (32-bit) when reaching sizes
> around 2k - 16k. Like 4x slower than the old code. Apparently the
> issue is that the work being done to align the destination mod 4
> misaligns it mod higher powers of two, and "rep stos" performs
> pathologically bad when it's not cache-line-aligned, or something like
> that. On my faster 64-bit system alignment mod 16 also seems to make a
> difference, but less - it's 1.5x slower misaligned mod 16.
>
> I also found that on the 32-bit Atom, there seems to be a huge jump in
> speed at size 1024 -- sizes just below 1024 are roughly 2x slower.
> Since it otherwise doesn't make a measurable difference, it seems
> preferable _not_ to try to reduce the length of the rep stos to avoid
> writing the same bytes multiple times but simply use the max allowable
> length.
>
> Combined with the first issue, it seems we should "round up to a
> multiple of 16" rather than "add 16 then round down to a multiple of
> 16". Not only does this avoid reducing the length of the rep stos; it
> also preserves any higher-than-16 alignment that might be preexisting,
> in case even higher alignments are faster.
>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.